Open Tabs
- Launcher
- Models2.ipynb
- Models.ipynb
- FinalProject.ipynb
- Output View
Kernels
- Models2.ipynb
- FinalProject.ipynb
- Models.ipynb
Terminals
The JupyterLab development team is excited to have a robust
third-party extension community. However, we do not review
third-party extensions, and some extensions may introduce security
risks or contain malicious code that runs on your machine. Moreover in order
to work, this panel needs to fetch data from web services.
Please read the privacy policy.
- ActivityRecognitionModelBuilding.ipynb17 days ago
- M01_Homework_Cordelli.ipynb3 months ago
- Models.ipynbseconds ago
- Models2.ipynb41 minutes ago
- Models3~Sentiment.ipynb5 hours ago
- Launcher
- Models2.ipynb
- Models.ipynb
- FinalProject.ipynb
Notebook
Python 3 (ipykernel)

AMPTorch (20201028) Active Learning
.png)
AMPTorch 0.1
.png)
ASE 3.20.1
OCP Models 0.1.0
.png)
PyTorch 1.12.0
.png)
PyTorch 2.0.1
.png)
PyTorch 2.4.0
.png)
R 4.4.1
.png)
RAPIDS 24.06
RHESSys Biome-BGC
RHESsys_v3
RHESsys_v3_3
.png)
Tensorflow 2.13.0
.png)
Tensorflow 2.17.0
Console
Python 3 (ipykernel)

AMPTorch (20201028) Active Learning
.png)
AMPTorch 0.1
.png)
ASE 3.20.1
OCP Models 0.1.0
.png)
PyTorch 1.12.0
.png)
PyTorch 2.0.1
.png)
PyTorch 2.4.0
.png)
R 4.4.1
.png)
RAPIDS 24.06
RHESSys Biome-BGC
RHESsys_v3
RHESsys_v3_3
.png)
Tensorflow 2.13.0
.png)
Tensorflow 2.17.0
Other
Terminal
Text File
Markdown File
Python File
R File
Show Contextual Help
import pandas as pdimport numpy as npfrom sklearn.feature_extraction.text import CountVectorizerfrom sklearn.decomposition import LatentDirichletAllocation as LDAimport plotly_express as pxfrom sklearn.decomposition import PCAfrom sklearn.preprocessing import normalizepd.set_option('display.max_colwidth', None)import configparserconfig = configparser.ConfigParser()config.read('env.ini')data_home = config['DEFAULT']['data_home']output_dir = config['DEFAULT']['output_dir']data_prefix = 'entrepreneur'colors = "YlGnBu"ngram_range = (1, 2)n_terms = 4000n_topics = 40max_iter = 20n_top_terms = 9OHCO = ['screenplay_id', 'scene_id', 'para_num', 'sent_num', 'token_num']PARA = OHCO[:3]SCENE = OHCO[:2]SCREENPLAY = OHCO[:1]BAG = SCENEimport warningswarnings.filterwarnings('ignore')TOKENS = pd.read_csv(f'{output_dir}/{data_prefix}-TOKEN.csv').set_index(OHCO)TOKENS.head()| pos_tuple | pos | token_str | term_str | pos_group | |||||
|---|---|---|---|---|---|---|---|---|---|
| screenplay_id | scene_id | para_num | sent_num | token_num | |||||
| joy | 1 | 0 | 0 | 0 | ('The', 'DT') | DT | The | the | DT |
| 1 | ('kitchen', 'NN') | NN | kitchen | kitchen | NN | ||||
| 2 | ('of', 'IN') | IN | of | of | IN | ||||
| 3 | ('a', 'DT') | DT | a | a | DT | ||||
| 4 | ('drive', 'NN') | NN | drive | drive | NN |
# Prep for LDAPrep for LDA¶
DOCS = TOKENS[TOKENS.pos.str.match(r'^NNS?$')]\ .groupby(BAG).term_str\ .apply(lambda x: ' '.join(map(str,x)))\ .to_frame()\ .rename(columns={'term_str':'doc_str'})count_engine = CountVectorizer(max_features=n_terms, ngram_range=ngram_range, stop_words='english')count_model = count_engine.fit_transform(DOCS.doc_str)TERMS = count_engine.get_feature_names_out()VOCAB = pd.DataFrame(index=TERMS)VOCAB.index.name = 'term_str'DTM = pd.DataFrame(count_model.toarray(), index=DOCS.index, columns=TERMS)DTM| 05 | 1350000 | aback | ability | absorbing | access | account | accounts | acre | act | ... | youll | youre | youre beat | youre gonna | youre right | youve | yule | zero | òmó | ôem | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| screenplay_id | scene_id | |||||||||||||||||||||
| joy | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| the_social_network | 569 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 572 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 573 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | |
| 574 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 575 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1846 rows × 4000 columns
VOCAB['doc_count'] = DTM.astype('bool').astype('int').sum()DOCS['term_count'] = DTM.sum(1)DOCS.term_count.describe()count 1846.000000 mean 14.210184 std 13.311210 min 0.000000 25% 4.000000 50% 11.000000 75% 21.000000 max 110.000000 Name: term_count, dtype: float64
lda_engine = LDA(n_components=n_topics, max_iter=max_iter, learning_offset=50., random_state=0)TNAMES = [f"T{str(x).zfill(len(str(n_topics)))}" for x in range(n_topics)]lda_model = lda_engine.fit_transform(count_model)## THETATHETA¶
THETA = pd.DataFrame(lda_model, index=DOCS.index)THETA.columns.name = 'topic_id'THETA.columns = TNAMESTHETA.sample(10).T.style.background_gradient(cmap=colors, axis=None)| screenplay_id | the_big_short | the_social_network | the_help | steve_jobs | the_founder | steve_jobs | the_founder | |||
|---|---|---|---|---|---|---|---|---|---|---|
| scene_id | 5 | 485 | 418 | 253 | 742 | 655 | 661 | 166 | 810 | 9 |
| T00 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
| T01 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.510390 | 0.005000 |
| T02 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
| T03 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
| T04 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.860714 | 0.001471 | 0.005000 |
| T05 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
| T06 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
| T07 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
| T08 | 0.000481 | 0.434657 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
| T09 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.351972 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
| T10 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
| T11 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
| T12 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
| T13 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
| T14 | 0.000481 | 0.497486 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
| T15 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
| T16 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
| T17 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.433728 | 0.005000 |
| T18 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
| T19 | 0.981250 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
| T20 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
| T21 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
| T22 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
| T23 | 0.000481 | 0.001786 | 0.025000 | 0.805000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
| T24 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
| T25 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
| T26 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
| T27 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.512500 | 0.003571 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
| T28 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
| T29 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.001471 | 0.805000 |
| T30 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
| T31 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
| T32 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
| T33 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
| T34 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.878125 | 0.003571 | 0.001471 | 0.005000 |
| T35 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
| T36 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
| T37 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
| T38 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.512314 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
| T39 | 0.000481 | 0.001786 | 0.025000 | 0.005000 | 0.012500 | 0.003571 | 0.003125 | 0.003571 | 0.001471 | 0.005000 |
## PHIPHI¶
PHI = pd.DataFrame(lda_engine.components_, columns=TERMS, index=TNAMES)PHI.index.name = 'topic_id'PHI.columns.name = 'term_str'PHI.T.sample(10).style.background_gradient(cmap=colors, axis=None)| topic_id | T00 | T01 | T02 | T03 | T04 | T05 | T06 | T07 | T08 | T09 | T10 | T11 | T12 | T13 | T14 | T15 | T16 | T17 | T18 | T19 | T20 | T21 | T22 | T23 | T24 | T25 | T26 | T27 | T28 | T29 | T30 | T31 | T32 | T33 | T34 | T35 | T36 | T37 | T38 | T39 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| term_str | ||||||||||||||||||||||||||||||||||||||||
| plan | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 3.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 2.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 3.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 2.025000 | 0.025000 | 0.025000 | 0.025000 |
| dining | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 1.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 4.025000 | 0.025000 | 0.025000 | 0.025000 | 1.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 1.025000 | 0.025000 | 2.025000 | 0.025000 | 0.025000 | 5.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 2.025000 |
| jared | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 4.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 |
| stories skeeter | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 2.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 |
| dress mother | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 2.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 |
| pennies heaven | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 2.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 4.025000 | 0.025000 |
| shots | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 4.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 1.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 1.025000 | 0.025000 | 0.025000 |
| ship date | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 2.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 |
| drift busy | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 2.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 |
| bags | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 2.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 2.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 1.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 0.025000 | 1.025000 | 0.025000 |
## TOPICSTOPICS¶
TOPICS = PHI.stack().groupby('topic_id')\ .apply(lambda x: ' '.join(x.sort_values(ascending=False).head(n_top_terms).reset_index().term_str))\ .to_frame('top_terms')TOPICS.head()| top_terms | |
|---|---|
| topic_id | |
| T00 | swaps sign place glass loans dont money golf default swaps |
| T01 | right people book hand vo way arches building deal |
| T02 | room kitchen living living room people house time way world |
| T03 | vo school computer people money gonna brother right room |
| T04 | contd sorry money men youre summers sir skeeter today |
TOPICS['doc_weight_sum'] = THETA.sum()TOPICS['term_freq'] = PHI.sum(1) / PHI.sum(1).sum()TOPICS.sort_values('doc_weight_sum', ascending=False).style.background_gradient(cmap=colors)| top_terms | doc_weight_sum | term_freq | |
|---|---|---|---|
| topic_id | |||
| T27 | continued lot arches angles parking lot parking cover restaurant things | 118.237774 | 0.025378 |
| T19 | bonds mortgage banks car mortgage bonds housing market mortgages shit | 62.955227 | 0.042527 |
| T30 | computer room moment door skeeter conference hand hands time | 57.578945 | 0.029280 |
| T25 | yule skeeter os door map phone table book desk | 55.382689 | 0.026782 |
| T09 | door board house way man world time bed record | 53.224913 | 0.028538 |
| T15 | people end dress vo day right way business party | 52.469261 | 0.027682 |
| T20 | girls kitchen door room skeeter guys morning nods end | 52.465449 | 0.030366 |
| T16 | os phone time right people ll door way bank | 52.104587 | 0.030608 |
| T06 | thing gonna business people beat time summers youre milkshake | 49.394011 | 0.032061 |
| T04 | contd sorry money men youre summers sir skeeter today | 47.866102 | 0.026326 |
| T13 | gage time team share lot room ownership share ownership computer | 47.560548 | 0.021216 |
| T39 | car living money people time room table bedroom job | 47.529435 | 0.024985 |
| T07 | food lawyer page beat cut face chair drive chicken | 47.381610 | 0.024287 |
| T23 | house beat money os hand line country water mother | 46.924828 | 0.028691 |
| T31 | phone things losses mortgage hand swaps people bond department | 46.836768 | 0.025905 |
| T37 | door stairs race day hallway people room map thats | 46.691889 | 0.023187 |
| T05 | phone os couple hour look years thing car burger | 46.445381 | 0.030379 |
| T03 | vo school computer people money gonna brother right room | 46.253560 | 0.029903 |
| T02 | room kitchen living living room people house time way world | 44.456282 | 0.024038 |
| T35 | door bathroom toilet computer os letter deal bathroom door money | 44.089767 | 0.023956 |
| T38 | door office time piano years people pool bag desks | 43.968562 | 0.025484 |
| T22 | people time lot shirt site theyre way end bet | 43.367253 | 0.029525 |
| T33 | loan yule head purse home officer police house things | 43.234263 | 0.023981 |
| T32 | company time party kind thing phone change building table | 42.984995 | 0.020344 |
| T12 | phone vo cell cell phone office shakes beat milk milk shakes | 42.073192 | 0.027820 |
| T18 | eyes end hand skeeter man kitchen line restaurant town | 42.030787 | 0.028389 |
| T01 | right people book hand vo way arches building deal | 42.022960 | 0.020859 |
| T10 | chicken bun kitchen way room face product smile skeeter | 41.701578 | 0.021272 |
| T34 | arches place house hand vanilla yule things thing pair | 40.998628 | 0.027419 |
| T11 | letter business right time hell money summers table sorry | 40.799953 | 0.026187 |
| T08 | land house beat years bed sound minutes sure ticket | 39.527795 | 0.017747 |
| T00 | swaps sign place glass loans dont money golf default swaps | 38.663781 | 0.026031 |
| T36 | tray table cover smile sight employees glance men time | 36.700751 | 0.022375 |
| T26 | thing idea beat support car skeeter coat silence right | 36.357623 | 0.019120 |
| T29 | line finger people years days food end store restaurant | 36.117518 | 0.016992 |
| T17 | room idea launch slots test people members company hundreds | 36.032378 | 0.020834 |
| T21 | dollars session head drums cream cooler ice cream ice businessman | 35.242060 | 0.018704 |
| T28 | table os room stock screen bottle glass shares people | 34.537674 | 0.018273 |
| T14 | time right stand man dog size left fight office | 34.077687 | 0.020247 |
| T24 | news late cheerleaders things woman store window drinks eduardo | 29.711536 | 0.012302 |
cols = TOPICS.columns[1:].to_list()r = TOPICS[cols].corr().iloc[1,1]TOPICS.plot.scatter(*cols, title=f"r={r}");TOPICS.sort_values('doc_weight_sum', ascending=True).plot.barh(y='doc_weight_sum', x='top_terms', figsize=(5, n_topics/2));## LDA + VisualizationsLDA + Visualizations¶
LIB = pd.read_csv(f'{output_dir}/{data_prefix}-LIB.csv').set_index('screenplay_id')LIB['title_key'] = LIB.raw_title.str.split(', ').str[0].str.lower()TITLES = sorted(LIB.title_key.value_counts().index.to_list())TOPICS[TITLES] = THETA.join(LIB, on='screenplay_id').groupby('title_key')[TNAMES].mean().TTOPICS[TITLES + ['top_terms']].style.background_gradient(cmap=colors, axis=None)| joy | steve jobs | the big short | the founder | the help | the social network | top_terms | |
|---|---|---|---|---|---|---|---|
| topic_id | |||||||
| T00 | 0.026062 | 0.012994 | 0.034163 | 0.029206 | 0.018490 | 0.016646 | swaps sign place glass loans dont money golf default swaps |
| T01 | 0.014143 | 0.022036 | 0.024812 | 0.017367 | 0.031259 | 0.024446 | right people book hand vo way arches building deal |
| T02 | 0.024674 | 0.025316 | 0.027253 | 0.023974 | 0.033112 | 0.011998 | room kitchen living living room people house time way world |
| T03 | 0.011629 | 0.032663 | 0.039859 | 0.011350 | 0.014803 | 0.034096 | vo school computer people money gonna brother right room |
| T04 | 0.024664 | 0.020378 | 0.025004 | 0.023964 | 0.024454 | 0.037505 | contd sorry money men youre summers sir skeeter today |
| T05 | 0.027775 | 0.028449 | 0.025132 | 0.028991 | 0.016570 | 0.024112 | phone os couple hour look years thing car burger |
| T06 | 0.024883 | 0.029924 | 0.018924 | 0.024176 | 0.013333 | 0.042491 | thing gonna business people beat time summers youre milkshake |
| T07 | 0.032812 | 0.013282 | 0.009825 | 0.035572 | 0.023684 | 0.041719 | food lawyer page beat cut face chair drive chicken |
| T08 | 0.033579 | 0.020178 | 0.019383 | 0.036120 | 0.008286 | 0.017613 | land house beat years bed sound minutes sure ticket |
| T09 | 0.014145 | 0.047853 | 0.031476 | 0.015724 | 0.027932 | 0.021780 | door board house way man world time bed record |
| T10 | 0.024401 | 0.015546 | 0.023332 | 0.023710 | 0.023875 | 0.028400 | chicken bun kitchen way room face product smile skeeter |
| T11 | 0.022318 | 0.013515 | 0.014945 | 0.023672 | 0.035683 | 0.024285 | letter business right time hell money summers table sorry |
| T12 | 0.020288 | 0.008833 | 0.046225 | 0.019729 | 0.025477 | 0.028899 | phone vo cell cell phone office shakes beat milk milk shakes |
| T13 | 0.015877 | 0.025338 | 0.016032 | 0.015461 | 0.026719 | 0.045244 | gage time team share lot room ownership share ownership computer |
| T14 | 0.022591 | 0.016669 | 0.013812 | 0.021958 | 0.009169 | 0.026822 | time right stand man dog size left fight office |
| T15 | 0.016215 | 0.023402 | 0.048796 | 0.015788 | 0.038509 | 0.030980 | people end dress vo day right way business party |
| T16 | 0.020478 | 0.033898 | 0.041566 | 0.019913 | 0.031401 | 0.020871 | os phone time right people ll door way bank |
| T17 | 0.018474 | 0.029078 | 0.013033 | 0.017975 | 0.012779 | 0.018512 | room idea launch slots test people members company hundreds |
| T18 | 0.030413 | 0.010789 | 0.005931 | 0.035517 | 0.041268 | 0.017908 | eyes end hand skeeter man kitchen line restaurant town |
| T19 | 0.035656 | 0.013304 | 0.125312 | 0.034602 | 0.023848 | 0.015376 | bonds mortgage banks car mortgage bonds housing market mortgages shit |
| T20 | 0.025562 | 0.020354 | 0.018860 | 0.025871 | 0.047382 | 0.031439 | girls kitchen door room skeeter guys morning nods end |
| T21 | 0.025985 | 0.009411 | 0.032583 | 0.025242 | 0.017884 | 0.016087 | dollars session head drums cream cooler ice cream ice businessman |
| T22 | 0.010604 | 0.023082 | 0.020025 | 0.010358 | 0.022144 | 0.045332 | people time lot shirt site theyre way end bet |
| T23 | 0.033485 | 0.018882 | 0.031301 | 0.033236 | 0.024106 | 0.020884 | house beat money os hand line country water mother |
| T24 | 0.020953 | 0.011309 | 0.021547 | 0.020373 | 0.006978 | 0.021231 | news late cheerleaders things woman store window drinks eduardo |
| T25 | 0.030883 | 0.009680 | 0.013166 | 0.029983 | 0.067089 | 0.032981 | yule skeeter os door map phone table book desk |
| T26 | 0.032564 | 0.015063 | 0.010384 | 0.031609 | 0.020522 | 0.013597 | thing idea beat support car skeeter coat silence right |
| T27 | 0.025960 | 0.191956 | 0.009319 | 0.025218 | 0.014161 | 0.023927 | continued lot arches angles parking lot parking cover restaurant things |
| T28 | 0.020003 | 0.023017 | 0.016693 | 0.019454 | 0.011275 | 0.019501 | table os room stock screen bottle glass shares people |
| T29 | 0.028367 | 0.018597 | 0.010126 | 0.027548 | 0.019728 | 0.014765 | line finger people years days food end store restaurant |
| T30 | 0.029478 | 0.030842 | 0.018258 | 0.028623 | 0.040379 | 0.034004 | computer room moment door skeeter conference hand hands time |
| T31 | 0.022496 | 0.015631 | 0.068883 | 0.021866 | 0.017662 | 0.023845 | phone things losses mortgage hand swaps people bond department |
| T32 | 0.025468 | 0.021402 | 0.010975 | 0.024743 | 0.009370 | 0.043314 | company time party kind thing phone change building table |
| T33 | 0.030723 | 0.023898 | 0.007847 | 0.029828 | 0.035636 | 0.011515 | loan yule head purse home officer police house things |
| T34 | 0.025884 | 0.022744 | 0.009850 | 0.025145 | 0.025673 | 0.021171 | arches place house hand vanilla yule things thing pair |
| T35 | 0.029846 | 0.021929 | 0.021063 | 0.028980 | 0.028423 | 0.016429 | door bathroom toilet computer os letter deal bathroom door money |
| T36 | 0.037666 | 0.011187 | 0.009207 | 0.036547 | 0.019457 | 0.014496 | tray table cover smile sight employees glance men time |
| T37 | 0.025072 | 0.022732 | 0.023155 | 0.024360 | 0.029413 | 0.027064 | door stairs race day hallway people room map thats |
| T38 | 0.017601 | 0.021155 | 0.024288 | 0.017130 | 0.038474 | 0.022705 | door office time piano years people pool bag desks |
| T39 | 0.040322 | 0.023682 | 0.017657 | 0.039117 | 0.023590 | 0.016010 | car living money people time room table bedroom job |
TOPICS['title'] = TOPICS[TITLES].idxmax(1)TOPICS.sort_values(['title','doc_weight_sum'], ascending=[True,False]).style.background_gradient(cmap=colors)| top_terms | doc_weight_sum | term_freq | joy | steve jobs | the big short | the founder | the help | the social network | title | |
|---|---|---|---|---|---|---|---|---|---|---|
| topic_id | ||||||||||
| T39 | car living money people time room table bedroom job | 47.529435 | 0.024985 | 0.040322 | 0.023682 | 0.017657 | 0.039117 | 0.023590 | 0.016010 | joy |
| T23 | house beat money os hand line country water mother | 46.924828 | 0.028691 | 0.033485 | 0.018882 | 0.031301 | 0.033236 | 0.024106 | 0.020884 | joy |
| T35 | door bathroom toilet computer os letter deal bathroom door money | 44.089767 | 0.023956 | 0.029846 | 0.021929 | 0.021063 | 0.028980 | 0.028423 | 0.016429 | joy |
| T34 | arches place house hand vanilla yule things thing pair | 40.998628 | 0.027419 | 0.025884 | 0.022744 | 0.009850 | 0.025145 | 0.025673 | 0.021171 | joy |
| T36 | tray table cover smile sight employees glance men time | 36.700751 | 0.022375 | 0.037666 | 0.011187 | 0.009207 | 0.036547 | 0.019457 | 0.014496 | joy |
| T26 | thing idea beat support car skeeter coat silence right | 36.357623 | 0.019120 | 0.032564 | 0.015063 | 0.010384 | 0.031609 | 0.020522 | 0.013597 | joy |
| T29 | line finger people years days food end store restaurant | 36.117518 | 0.016992 | 0.028367 | 0.018597 | 0.010126 | 0.027548 | 0.019728 | 0.014765 | joy |
| T27 | continued lot arches angles parking lot parking cover restaurant things | 118.237774 | 0.025378 | 0.025960 | 0.191956 | 0.009319 | 0.025218 | 0.014161 | 0.023927 | steve jobs |
| T09 | door board house way man world time bed record | 53.224913 | 0.028538 | 0.014145 | 0.047853 | 0.031476 | 0.015724 | 0.027932 | 0.021780 | steve jobs |
| T17 | room idea launch slots test people members company hundreds | 36.032378 | 0.020834 | 0.018474 | 0.029078 | 0.013033 | 0.017975 | 0.012779 | 0.018512 | steve jobs |
| T28 | table os room stock screen bottle glass shares people | 34.537674 | 0.018273 | 0.020003 | 0.023017 | 0.016693 | 0.019454 | 0.011275 | 0.019501 | steve jobs |
| T19 | bonds mortgage banks car mortgage bonds housing market mortgages shit | 62.955227 | 0.042527 | 0.035656 | 0.013304 | 0.125312 | 0.034602 | 0.023848 | 0.015376 | the big short |
| T15 | people end dress vo day right way business party | 52.469261 | 0.027682 | 0.016215 | 0.023402 | 0.048796 | 0.015788 | 0.038509 | 0.030980 | the big short |
| T16 | os phone time right people ll door way bank | 52.104587 | 0.030608 | 0.020478 | 0.033898 | 0.041566 | 0.019913 | 0.031401 | 0.020871 | the big short |
| T31 | phone things losses mortgage hand swaps people bond department | 46.836768 | 0.025905 | 0.022496 | 0.015631 | 0.068883 | 0.021866 | 0.017662 | 0.023845 | the big short |
| T03 | vo school computer people money gonna brother right room | 46.253560 | 0.029903 | 0.011629 | 0.032663 | 0.039859 | 0.011350 | 0.014803 | 0.034096 | the big short |
| T12 | phone vo cell cell phone office shakes beat milk milk shakes | 42.073192 | 0.027820 | 0.020288 | 0.008833 | 0.046225 | 0.019729 | 0.025477 | 0.028899 | the big short |
| T00 | swaps sign place glass loans dont money golf default swaps | 38.663781 | 0.026031 | 0.026062 | 0.012994 | 0.034163 | 0.029206 | 0.018490 | 0.016646 | the big short |
| T21 | dollars session head drums cream cooler ice cream ice businessman | 35.242060 | 0.018704 | 0.025985 | 0.009411 | 0.032583 | 0.025242 | 0.017884 | 0.016087 | the big short |
| T24 | news late cheerleaders things woman store window drinks eduardo | 29.711536 | 0.012302 | 0.020953 | 0.011309 | 0.021547 | 0.020373 | 0.006978 | 0.021231 | the big short |
| T05 | phone os couple hour look years thing car burger | 46.445381 | 0.030379 | 0.027775 | 0.028449 | 0.025132 | 0.028991 | 0.016570 | 0.024112 | the founder |
| T08 | land house beat years bed sound minutes sure ticket | 39.527795 | 0.017747 | 0.033579 | 0.020178 | 0.019383 | 0.036120 | 0.008286 | 0.017613 | the founder |
| T30 | computer room moment door skeeter conference hand hands time | 57.578945 | 0.029280 | 0.029478 | 0.030842 | 0.018258 | 0.028623 | 0.040379 | 0.034004 | the help |
| T25 | yule skeeter os door map phone table book desk | 55.382689 | 0.026782 | 0.030883 | 0.009680 | 0.013166 | 0.029983 | 0.067089 | 0.032981 | the help |
| T20 | girls kitchen door room skeeter guys morning nods end | 52.465449 | 0.030366 | 0.025562 | 0.020354 | 0.018860 | 0.025871 | 0.047382 | 0.031439 | the help |
| T37 | door stairs race day hallway people room map thats | 46.691889 | 0.023187 | 0.025072 | 0.022732 | 0.023155 | 0.024360 | 0.029413 | 0.027064 | the help |
| T02 | room kitchen living living room people house time way world | 44.456282 | 0.024038 | 0.024674 | 0.025316 | 0.027253 | 0.023974 | 0.033112 | 0.011998 | the help |
| T38 | door office time piano years people pool bag desks | 43.968562 | 0.025484 | 0.017601 | 0.021155 | 0.024288 | 0.017130 | 0.038474 | 0.022705 | the help |
| T33 | loan yule head purse home officer police house things | 43.234263 | 0.023981 | 0.030723 | 0.023898 | 0.007847 | 0.029828 | 0.035636 | 0.011515 | the help |
| T18 | eyes end hand skeeter man kitchen line restaurant town | 42.030787 | 0.028389 | 0.030413 | 0.010789 | 0.005931 | 0.035517 | 0.041268 | 0.017908 | the help |
| T01 | right people book hand vo way arches building deal | 42.022960 | 0.020859 | 0.014143 | 0.022036 | 0.024812 | 0.017367 | 0.031259 | 0.024446 | the help |
| T11 | letter business right time hell money summers table sorry | 40.799953 | 0.026187 | 0.022318 | 0.013515 | 0.014945 | 0.023672 | 0.035683 | 0.024285 | the help |
| T06 | thing gonna business people beat time summers youre milkshake | 49.394011 | 0.032061 | 0.024883 | 0.029924 | 0.018924 | 0.024176 | 0.013333 | 0.042491 | the social network |
| T04 | contd sorry money men youre summers sir skeeter today | 47.866102 | 0.026326 | 0.024664 | 0.020378 | 0.025004 | 0.023964 | 0.024454 | 0.037505 | the social network |
| T13 | gage time team share lot room ownership share ownership computer | 47.560548 | 0.021216 | 0.015877 | 0.025338 | 0.016032 | 0.015461 | 0.026719 | 0.045244 | the social network |
| T07 | food lawyer page beat cut face chair drive chicken | 47.381610 | 0.024287 | 0.032812 | 0.013282 | 0.009825 | 0.035572 | 0.023684 | 0.041719 | the social network |
| T22 | people time lot shirt site theyre way end bet | 43.367253 | 0.029525 | 0.010604 | 0.023082 | 0.020025 | 0.010358 | 0.022144 | 0.045332 | the social network |
| T32 | company time party kind thing phone change building table | 42.984995 | 0.020344 | 0.025468 | 0.021402 | 0.010975 | 0.024743 | 0.009370 | 0.043314 | the social network |
| T10 | chicken bun kitchen way room face product smile skeeter | 41.701578 | 0.021272 | 0.024401 | 0.015546 | 0.023332 | 0.023710 | 0.023875 | 0.028400 | the social network |
| T14 | time right stand man dog size left fight office | 34.077687 | 0.020247 | 0.022591 | 0.016669 | 0.013812 | 0.021958 | 0.009169 | 0.026822 | the social network |
from scipy.spatial.distance import pdisttpairs_idx = [(a, b) for a, b in pd.MultiIndex.from_product([TOPICS.index, TOPICS.index]) if a < b]TPAIRS = pd.DataFrame(tpairs_idx, columns=['topic_id_x', 'topic_id_y']).set_index(['topic_id_x', 'topic_id_y'])TPAIRS['theta_cityblock'] = pdist(THETA.T, 'cityblock')TPAIRS['theta_cosine'] = pdist(THETA.T, 'cosine')TPAIRS['theta_canberra'] = pdist(THETA.T, 'canberra')TPAIRS['theta_jaccard'] = pdist(THETA.T, 'jaccard')TPAIRS['theta_js'] = pdist(THETA.T, 'jensenshannon')TPAIRS['phi_cityblock'] = pdist(PHI, 'cityblock')TPAIRS['phi_cosine'] = pdist(PHI, 'cosine')TPAIRS['phi_canberra'] = pdist(PHI, 'canberra')TPAIRS['phi_jaccard'] = pdist(PHI, 'jaccard')TPAIRS['phi_js'] = pdist(PHI, 'jensenshannon')import pandas as pdimport numpy as npimport plotly_express as pximport seaborn as sns; sns.set()sns.pairplot(TPAIRS);## PHI PCAPHI PCA¶
pca_engine_phi = PCA(4)PHI_COMPS = pd.DataFrame(pca_engine_phi.fit_transform(normalize(PHI, norm='l2', axis=1)), index=PHI.index)TOPICS| top_terms | doc_weight_sum | term_freq | joy | steve jobs | the big short | the founder | the help | the social network | title | |
|---|---|---|---|---|---|---|---|---|---|---|
| topic_id | ||||||||||
| T00 | swaps sign place glass loans dont money golf default swaps | 38.663781 | 0.026031 | 0.026062 | 0.012994 | 0.034163 | 0.029206 | 0.018490 | 0.016646 | the big short |
| T01 | right people book hand vo way arches building deal | 42.022960 | 0.020859 | 0.014143 | 0.022036 | 0.024812 | 0.017367 | 0.031259 | 0.024446 | the help |
| T02 | room kitchen living living room people house time way world | 44.456282 | 0.024038 | 0.024674 | 0.025316 | 0.027253 | 0.023974 | 0.033112 | 0.011998 | the help |
| T03 | vo school computer people money gonna brother right room | 46.253560 | 0.029903 | 0.011629 | 0.032663 | 0.039859 | 0.011350 | 0.014803 | 0.034096 | the big short |
| T04 | contd sorry money men youre summers sir skeeter today | 47.866102 | 0.026326 | 0.024664 | 0.020378 | 0.025004 | 0.023964 | 0.024454 | 0.037505 | the social network |
| T05 | phone os couple hour look years thing car burger | 46.445381 | 0.030379 | 0.027775 | 0.028449 | 0.025132 | 0.028991 | 0.016570 | 0.024112 | the founder |
| T06 | thing gonna business people beat time summers youre milkshake | 49.394011 | 0.032061 | 0.024883 | 0.029924 | 0.018924 | 0.024176 | 0.013333 | 0.042491 | the social network |
| T07 | food lawyer page beat cut face chair drive chicken | 47.381610 | 0.024287 | 0.032812 | 0.013282 | 0.009825 | 0.035572 | 0.023684 | 0.041719 | the social network |
| T08 | land house beat years bed sound minutes sure ticket | 39.527795 | 0.017747 | 0.033579 | 0.020178 | 0.019383 | 0.036120 | 0.008286 | 0.017613 | the founder |
| T09 | door board house way man world time bed record | 53.224913 | 0.028538 | 0.014145 | 0.047853 | 0.031476 | 0.015724 | 0.027932 | 0.021780 | steve jobs |
| T10 | chicken bun kitchen way room face product smile skeeter | 41.701578 | 0.021272 | 0.024401 | 0.015546 | 0.023332 | 0.023710 | 0.023875 | 0.028400 | the social network |
| T11 | letter business right time hell money summers table sorry | 40.799953 | 0.026187 | 0.022318 | 0.013515 | 0.014945 | 0.023672 | 0.035683 | 0.024285 | the help |
| T12 | phone vo cell cell phone office shakes beat milk milk shakes | 42.073192 | 0.027820 | 0.020288 | 0.008833 | 0.046225 | 0.019729 | 0.025477 | 0.028899 | the big short |
| T13 | gage time team share lot room ownership share ownership computer | 47.560548 | 0.021216 | 0.015877 | 0.025338 | 0.016032 | 0.015461 | 0.026719 | 0.045244 | the social network |
| T14 | time right stand man dog size left fight office | 34.077687 | 0.020247 | 0.022591 | 0.016669 | 0.013812 | 0.021958 | 0.009169 | 0.026822 | the social network |
| T15 | people end dress vo day right way business party | 52.469261 | 0.027682 | 0.016215 | 0.023402 | 0.048796 | 0.015788 | 0.038509 | 0.030980 | the big short |
| T16 | os phone time right people ll door way bank | 52.104587 | 0.030608 | 0.020478 | 0.033898 | 0.041566 | 0.019913 | 0.031401 | 0.020871 | the big short |
| T17 | room idea launch slots test people members company hundreds | 36.032378 | 0.020834 | 0.018474 | 0.029078 | 0.013033 | 0.017975 | 0.012779 | 0.018512 | steve jobs |
| T18 | eyes end hand skeeter man kitchen line restaurant town | 42.030787 | 0.028389 | 0.030413 | 0.010789 | 0.005931 | 0.035517 | 0.041268 | 0.017908 | the help |
| T19 | bonds mortgage banks car mortgage bonds housing market mortgages shit | 62.955227 | 0.042527 | 0.035656 | 0.013304 | 0.125312 | 0.034602 | 0.023848 | 0.015376 | the big short |
| T20 | girls kitchen door room skeeter guys morning nods end | 52.465449 | 0.030366 | 0.025562 | 0.020354 | 0.018860 | 0.025871 | 0.047382 | 0.031439 | the help |
| T21 | dollars session head drums cream cooler ice cream ice businessman | 35.242060 | 0.018704 | 0.025985 | 0.009411 | 0.032583 | 0.025242 | 0.017884 | 0.016087 | the big short |
| T22 | people time lot shirt site theyre way end bet | 43.367253 | 0.029525 | 0.010604 | 0.023082 | 0.020025 | 0.010358 | 0.022144 | 0.045332 | the social network |
| T23 | house beat money os hand line country water mother | 46.924828 | 0.028691 | 0.033485 | 0.018882 | 0.031301 | 0.033236 | 0.024106 | 0.020884 | joy |
| T24 | news late cheerleaders things woman store window drinks eduardo | 29.711536 | 0.012302 | 0.020953 | 0.011309 | 0.021547 | 0.020373 | 0.006978 | 0.021231 | the big short |
| T25 | yule skeeter os door map phone table book desk | 55.382689 | 0.026782 | 0.030883 | 0.009680 | 0.013166 | 0.029983 | 0.067089 | 0.032981 | the help |
| T26 | thing idea beat support car skeeter coat silence right | 36.357623 | 0.019120 | 0.032564 | 0.015063 | 0.010384 | 0.031609 | 0.020522 | 0.013597 | joy |
| T27 | continued lot arches angles parking lot parking cover restaurant things | 118.237774 | 0.025378 | 0.025960 | 0.191956 | 0.009319 | 0.025218 | 0.014161 | 0.023927 | steve jobs |
| T28 | table os room stock screen bottle glass shares people | 34.537674 | 0.018273 | 0.020003 | 0.023017 | 0.016693 | 0.019454 | 0.011275 | 0.019501 | steve jobs |
| T29 | line finger people years days food end store restaurant | 36.117518 | 0.016992 | 0.028367 | 0.018597 | 0.010126 | 0.027548 | 0.019728 | 0.014765 | joy |
| T30 | computer room moment door skeeter conference hand hands time | 57.578945 | 0.029280 | 0.029478 | 0.030842 | 0.018258 | 0.028623 | 0.040379 | 0.034004 | the help |
| T31 | phone things losses mortgage hand swaps people bond department | 46.836768 | 0.025905 | 0.022496 | 0.015631 | 0.068883 | 0.021866 | 0.017662 | 0.023845 | the big short |
| T32 | company time party kind thing phone change building table | 42.984995 | 0.020344 | 0.025468 | 0.021402 | 0.010975 | 0.024743 | 0.009370 | 0.043314 | the social network |
| T33 | loan yule head purse home officer police house things | 43.234263 | 0.023981 | 0.030723 | 0.023898 | 0.007847 | 0.029828 | 0.035636 | 0.011515 | the help |
| T34 | arches place house hand vanilla yule things thing pair | 40.998628 | 0.027419 | 0.025884 | 0.022744 | 0.009850 | 0.025145 | 0.025673 | 0.021171 | joy |
| T35 | door bathroom toilet computer os letter deal bathroom door money | 44.089767 | 0.023956 | 0.029846 | 0.021929 | 0.021063 | 0.028980 | 0.028423 | 0.016429 | joy |
| T36 | tray table cover smile sight employees glance men time | 36.700751 | 0.022375 | 0.037666 | 0.011187 | 0.009207 | 0.036547 | 0.019457 | 0.014496 | joy |
| T37 | door stairs race day hallway people room map thats | 46.691889 | 0.023187 | 0.025072 | 0.022732 | 0.023155 | 0.024360 | 0.029413 | 0.027064 | the help |
| T38 | door office time piano years people pool bag desks | 43.968562 | 0.025484 | 0.017601 | 0.021155 | 0.024288 | 0.017130 | 0.038474 | 0.022705 | the help |
| T39 | car living money people time room table bedroom job | 47.529435 | 0.024985 | 0.040322 | 0.023682 | 0.017657 | 0.039117 | 0.023590 | 0.016010 | joy |
import matplotlib.pyplot as pltimport seaborn as sns# Prepare datadf = PHI_COMPS.reset_index().copy()df['size'] = TOPICS['term_freq'].valuesdf['color'] = TOPICS['title'].valuesdf['hover'] = TOPICS['doc_weight_sum'].valuesplt.figure(figsize=(10, 8))# Create scatter plotsns.scatterplot( data=df, x=0, y=1, hue='color', size='size', sizes=(20, 400), alpha=0.6, legend='brief')# Add topic_id labels near pointsfor _, row in df.iterrows(): plt.text(row[0]+0.1, row[1]+0.1, str(row['topic_id']), fontsize=9)plt.xlabel("Component 0")plt.ylabel("Component 1")plt.title("Topic Scatter by Term Frequency and Document Weight")plt.tight_layout()plt.show()px.scatter(PHI_COMPS.reset_index(), 0, 1, size=TOPICS.term_freq, color=TOPICS.title, text='topic_id', hover_name=TOPICS.doc_weight_sum, height=600, width=700)PHI_LOADINGS = pd.DataFrame(pca_engine_phi.components_.T * np.sqrt(pca_engine_phi.explained_variance_), index=PHI.T.index)PHI_LOADINGS.index.name = 'term_str'import matplotlib.pyplot as pltimport seaborn as sns# Prepare datadf = PHI_LOADINGS.reset_index()plt.figure(figsize=(10, 8)) # Approx 600x700 pixels# Plot pointssns.scatterplot(data=df, x=0, y=1, alpha=0.7)# Add term labelsfor _, row in df.iterrows(): plt.text(row[0]+0.01, row[1]+0.01, row['term_str'], fontsize=8)# Axes and layoutplt.xlabel("Component 0")plt.ylabel("Component 1")plt.title("Term Loadings on Topic Components")plt.tight_layout()plt.show()pca_engine_theta = PCA(5)THETA_COMPS = pd.DataFrame(pca_engine_theta.fit_transform(normalize(THETA.T.values, norm='l2', axis=1)), index=THETA.T.index)THETA_COMPS.index.name = 'topic_id'import matplotlib.pyplot as pltimport seaborn as sns# Merge or concatenate if needed — assume THETA_COMPS and TOPICS are aligned by indexdf = THETA_COMPS.reset_index().copy()df['title'] = TOPICS.title.valuesdf['doc_weight_sum'] = TOPICS.doc_weight_sum.valuesdf['topic_id'] = TOPICS.index if 'topic_id' not in df.columns else df['topic_id']plt.figure(figsize=(10, 8)) # Approx 700x600# Scatterplot with point size and colorscatter = sns.scatterplot( data=df, x=2, y=3, size='doc_weight_sum', hue='title', legend=False, alpha=0.7)# Add topic ID labels to pointsfor _, row in df.iterrows(): plt.text(row[2]+0.1, row[3]+0.1, str(row['topic_id']), fontsize=8)# Formattingplt.xlabel("Component 2")plt.ylabel("Component 3")plt.title("Topic Distribution: THETA Components")plt.tight_layout()plt.show()x
px.scatter(THETA_COMPS.reset_index(), 2, 3, size=TOPICS.doc_weight_sum, color=TOPICS.title, text='topic_id', hover_name=TOPICS.title, height=600, width=700)THETA_LOADINGS = pd.DataFrame(pca_engine_theta.components_.T * np.sqrt(pca_engine_theta.explained_variance_), index=THETA.index)DOCS = pd.DataFrame(DOCS)DOCS| doc_str | term_count | ||
|---|---|---|---|
| screenplay_id | scene_id | ||
| joy | 1 | kitchen drive restaurant its | 5 |
| 2 | sample pitch thinking heck spindle for shakes spindleó wrong notion chicken egg here milk shakes milk shakes latter customers order shake establishment its wait before again brand drive motor ability milk shakes mark dollars suckers stick at demand egg logic course bright fella idea beat say thoughtfully anyway | 67 | |
| 4 | car trunk back | 3 | |
| 6 | car sales watch its | 3 | |
| 7 | car customer spot front vast assortment items beef sandwiches tamales peanut butter chili dogs etc | 10 | |
| ... | ... | ... | ... |
| the_social_network | 569 | cool control panic control someone move is news now you no them somebody somebody coke cause there right ice home phone shut moment package desk earlier paper wrapping box box brand business cards business cards it womans voice vo mark | 33 |
| 572 | conference room one left voice lights skyline picture windows her day yeah here mark company day salad something guy testimony myths devil now others steak office settlement agreement gonna settle | 22 | |
| 573 | yeah extra guys disclosure agreement word wife kids jury jury selection jury sees defendant hair likability practice law months jury story chicken werent sorority party night one police question it youve jury minutes animals drunk stupid blogging blogging | 33 | |
| 574 | them scheme things speeding ticket anybody computer minute problem help asshole youre coat briefcase exits computer name search box name picture 07 smiles mouse forth boxes fiadd box request friend clicks homepage waits response hits settlement dollars disclosure agreement sixth hits settlement name masthead founder | 35 | |
| 575 | chair night to members countries dollars billionaire world waits waits | 7 |
1846 rows × 2 columns
DOCS['doc_label'] = DOCS.apply(lambda x: f"{LIB.loc[x.name[0]].raw_title}-{x.name[1]}", axis=1)DOCS['screen_play'] = DOCS.apply(lambda x: f"{LIB.loc[x.name[0]].raw_title}", axis=1)DOCS['n_chars'] = DOCS.doc_str.str.len()import matplotlib.pyplot as pltimport seaborn as sns# Combine THETA_LOADINGS with DOCS metadatadf = THETA_LOADINGS.reset_index().copy()df['screen_play'] = DOCS['screen_play'].values # assumes alignment by indexplt.figure(figsize=(12, 8)) # approx 900x600# Scatterplot with color by screen_playsns.scatterplot( data=df, x=0, y=1, hue='screen_play', palette='tab10', alpha=0.7)plt.xlabel("Component 0")plt.ylabel("Component 1")plt.title("THETA Loadings by Screenplay")plt.legend(title='Screenplay', bbox_to_anchor=(1.05, 1), loc='upper left')plt.tight_layout()plt.show()px.scatter(THETA_LOADINGS.reset_index(), 0, 1, # size=DOCS.n_chars, color=DOCS.screen_play, height=600, width=900)xxxxxxxxxxPHI_LOADINGS.head()| 0 | 1 | 2 | 3 | |
|---|---|---|---|---|
| term_str | ||||
| 05 | -0.001056 | -0.000118 | -0.000036 | 0.003184 |
| 1350000 | -0.002852 | 0.002178 | 0.000811 | -0.001873 |
| aback | -0.000044 | -0.001774 | 0.001317 | 0.000709 |
| ability | 0.004477 | 0.002495 | -0.001235 | 0.001143 |
| absorbing | 0.002355 | -0.001708 | 0.003512 | -0.000382 |
xxxxxxxxxxTHETA_LOADINGS| 0 | 1 | 2 | 3 | 4 | ||
|---|---|---|---|---|---|---|
| screenplay_id | scene_id | |||||
| joy | 1 | -0.002304 | -0.003213 | 0.005371 | 0.003894 | -0.006513 |
| 2 | 0.002661 | -0.000563 | 0.000541 | 0.002004 | 0.013783 | |
| 4 | 0.009537 | 0.000668 | 0.000696 | -0.000457 | -0.002512 | |
| 6 | 0.009537 | 0.000668 | 0.000696 | -0.000457 | -0.002512 | |
| 7 | 0.001526 | 0.007347 | 0.002287 | 0.001414 | 0.001315 | |
| ... | ... | ... | ... | ... | ... | ... |
| the_social_network | 569 | 0.003603 | -0.000761 | -0.004378 | -0.000921 | 0.009922 |
| 572 | 0.002200 | -0.001178 | -0.001982 | 0.004720 | 0.001502 | |
| 573 | -0.002736 | -0.009356 | -0.000108 | -0.001901 | -0.002036 | |
| 574 | -0.004560 | 0.001273 | 0.000453 | -0.008940 | 0.001517 | |
| 575 | 0.003153 | 0.001361 | -0.001179 | 0.001994 | -0.000507 |
1846 rows × 5 columns
DOCS| doc_str | term_count | doc_label | screen_play | n_chars | ||
|---|---|---|---|---|---|---|
| screenplay_id | scene_id | |||||
| joy | 1 | kitchen drive restaurant its | 5 | Joy-1 | Joy | 28 |
| 2 | sample pitch thinking heck spindle for shakes spindleó wrong notion chicken egg here milk shakes milk shakes latter customers order shake establishment its wait before again brand drive motor ability milk shakes mark dollars suckers stick at demand egg logic course bright fella idea beat say thoughtfully anyway | 67 | Joy-2 | Joy | 312 | |
| 4 | car trunk back | 3 | Joy-4 | Joy | 14 | |
| 6 | car sales watch its | 3 | Joy-6 | Joy | 19 | |
| 7 | car customer spot front vast assortment items beef sandwiches tamales peanut butter chili dogs etc | 10 | Joy-7 | Joy | 98 | |
| ... | ... | ... | ... | ... | ... | ... |
| the_social_network | 569 | cool control panic control someone move is news now you no them somebody somebody coke cause there right ice home phone shut moment package desk earlier paper wrapping box box brand business cards business cards it womans voice vo mark | 33 | The Social Network-569 | The Social Network | 235 |
| 572 | conference room one left voice lights skyline picture windows her day yeah here mark company day salad something guy testimony myths devil now others steak office settlement agreement gonna settle | 22 | The Social Network-572 | The Social Network | 196 | |
| 573 | yeah extra guys disclosure agreement word wife kids jury jury selection jury sees defendant hair likability practice law months jury story chicken werent sorority party night one police question it youve jury minutes animals drunk stupid blogging blogging | 33 | The Social Network-573 | The Social Network | 255 | |
| 574 | them scheme things speeding ticket anybody computer minute problem help asshole youre coat briefcase exits computer name search box name picture 07 smiles mouse forth boxes fiadd box request friend clicks homepage waits response hits settlement dollars disclosure agreement sixth hits settlement name masthead founder | 35 | The Social Network-574 | The Social Network | 316 | |
| 575 | chair night to members countries dollars billionaire world waits waits | 7 | The Social Network-575 | The Social Network | 70 |
1846 rows × 5 columns
xxxxxxxxxxSecond Two Topics¶
import matplotlib.pyplot as pltimport seaborn as sns# Step 1: Get mean topic weight across documentstopic_mean_weight = THETA.mean(axis=0) # Series with index T00, T01, ..., T39# Step 2: Flatten PHI_LOADINGS into long formatphi_long = PHI.reset_index().melt(id_vars='topic_id', var_name='term_str', value_name='phi_weight')# Step 3: Merge mean topic weight into PHI tablephi_long['mean_doc_weight'] = phi_long['topic_id'].map(topic_mean_weight)# Step 4: Run dimensionality reduction (e.g., PCA) to get x and y coordsfrom sklearn.decomposition import PCAphi_matrix = PHI.valuespca = PCA(n_components=2)xy = pca.fit_transform(phi_matrix)# Create topic -> (x, y) maptopic_xy = pd.DataFrame(xy, columns=['x', 'y'], index=PHI.index)# Step 5: Merge x, y into phi_longphi_long = phi_long.merge(topic_xy, left_on='topic_id', right_index=True)# Step 6: Plot with seaborn + matplotlibplt.figure(figsize=(10, 8))sns.scatterplot( data=phi_long, x='x', y='y', size='mean_doc_weight', sizes=(20, 300), alpha=0.6, legend=False)# Optional: Label each topicfor tid, row in topic_xy.iterrows(): plt.text(row['x'], row['y'], tid, fontsize=10, ha='center', va='bottom')plt.xlabel("PCA Component 1")plt.ylabel("PCA Component 2")plt.title("Topics in PHI Loadings (sized by mean document weight)")plt.tight_layout()plt.show()import matplotlib.pyplot as pltimport seaborn as sns# Step 1: Compute mean doc weight per topic (assuming THETA columns are topics)term_mean_weight = THETA.mean(axis=0) # Series indexed by topic_id# Step 2: Join to PHI_LOADINGS on topic_iddf = PHI.reset_index().copy()df['mean_doc_weight'] = df['topic_id'].map(term_mean_weight)# Step 3: Plot with size proportional to mean_doc_weightplt.figure(figsize=(10, 8))sns.scatterplot( data=df, x=2, y=3, size='mean_doc_weight', sizes=(20, 200), # Adjust min and max dot sizes alpha=0.7, legend=False)# Add text labelsfor _, row in df.iterrows(): plt.text(row[2], row[3], row['term_str'], fontsize=9, ha='center', va='bottom')plt.xlabel("Component 2")plt.ylabel("Component 3")plt.title("PHI Loadings (Component 2 vs 3), Sized by Mean Document Weight")plt.tight_layout()plt.show()--------------------------------------------------------------------------- KeyError Traceback (most recent call last) File ~/.local/lib/python3.11/site-packages/pandas/core/indexes/base.py:3805, in Index.get_loc(self, key) 3804 try: -> 3805 return self._engine.get_loc(casted_key) 3806 except KeyError as err: File index.pyx:167, in pandas._libs.index.IndexEngine.get_loc() File index.pyx:196, in pandas._libs.index.IndexEngine.get_loc() File pandas/_libs/hashtable_class_helper.pxi:7081, in pandas._libs.hashtable.PyObjectHashTable.get_item() File pandas/_libs/hashtable_class_helper.pxi:7089, in pandas._libs.hashtable.PyObjectHashTable.get_item() KeyError: 'term_str' The above exception was the direct cause of the following exception: KeyError Traceback (most recent call last) Cell In[71], line 25 23 # Add text labels 24 for _, row in df.iterrows(): ---> 25 plt.text(row[2], row[3], row['term_str'], fontsize=9, ha='center', va='bottom') 27 plt.xlabel("Component 2") 28 plt.ylabel("Component 3") File ~/.local/lib/python3.11/site-packages/pandas/core/series.py:1121, in Series.__getitem__(self, key) 1118 return self._values[key] 1120 elif key_is_scalar: -> 1121 return self._get_value(key) 1123 # Convert generator to list before going through hashable part 1124 # (We will iterate through the generator there to check for slices) 1125 if is_iterator(key): File ~/.local/lib/python3.11/site-packages/pandas/core/series.py:1237, in Series._get_value(self, label, takeable) 1234 return self._values[label] 1236 # Similar to Index.get_value, but we do not fall back to positional -> 1237 loc = self.index.get_loc(label) 1239 if is_integer(loc): 1240 return self._values[loc] File ~/.local/lib/python3.11/site-packages/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key) 3807 if isinstance(casted_key, slice) or ( 3808 isinstance(casted_key, abc.Iterable) 3809 and any(isinstance(x, slice) for x in casted_key) 3810 ): 3811 raise InvalidIndexError(key) -> 3812 raise KeyError(key) from err 3813 except TypeError: 3814 # If we have a listlike key, _check_indexing_error will raise 3815 # InvalidIndexError. Otherwise we fall through and re-raise 3816 # the TypeError. 3817 self._check_indexing_error(key) KeyError: 'term_str'
import matplotlib.pyplot as pltimport seaborn as sns# Prepare DataFramedf = PHI_LOADINGS.reset_index()plt.figure(figsize=(10, 8)) # Approx 700x600 in pixels# Basic scatterplotsns.scatterplot(data=df, x=2, y=3, alpha=0.7)# Add term labels on top of each pointfor _, row in df.iterrows(): plt.text(row[2], row[3], row['term_str'], fontsize=9, ha='center', va='bottom')plt.xlabel("Component 2")plt.ylabel("Component 3")plt.title("PHI Loadings (Component 2 vs 3)")plt.tight_layout()plt.show()x
THETA| topic_id | T00 | T01 | T02 | T03 | T04 | T05 | T06 | T07 | T08 | T09 | ... | T30 | T31 | T32 | T33 | T34 | T35 | T36 | T37 | T38 | T39 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| screenplay_id | scene_id | |||||||||||||||||||||
| joy | 1 | 0.004167 | 0.004167 | 0.004167 | 0.004167 | 0.004167 | 0.004167 | 0.004167 | 0.004167 | 0.004167 | 0.004167 | ... | 0.004167 | 0.004167 | 0.004167 | 0.004167 | 0.004167 | 0.004167 | 0.004167 | 0.004167 | 0.004167 | 0.004167 |
| 2 | 0.000368 | 0.000368 | 0.000368 | 0.000368 | 0.000368 | 0.000368 | 0.000368 | 0.000368 | 0.000368 | 0.000368 | ... | 0.000368 | 0.000368 | 0.000368 | 0.000368 | 0.000368 | 0.000368 | 0.000368 | 0.000368 | 0.000368 | 0.000368 | |
| 4 | 0.006250 | 0.006250 | 0.006250 | 0.006250 | 0.006250 | 0.006250 | 0.006250 | 0.006250 | 0.006250 | 0.006250 | ... | 0.006250 | 0.006250 | 0.006250 | 0.006250 | 0.006250 | 0.006250 | 0.006250 | 0.006250 | 0.006250 | 0.006250 | |
| 6 | 0.006250 | 0.006250 | 0.006250 | 0.006250 | 0.006250 | 0.006250 | 0.006250 | 0.006250 | 0.006250 | 0.006250 | ... | 0.006250 | 0.006250 | 0.006250 | 0.006250 | 0.006250 | 0.006250 | 0.006250 | 0.006250 | 0.006250 | 0.006250 | |
| 7 | 0.002273 | 0.002273 | 0.002273 | 0.002273 | 0.002273 | 0.002273 | 0.002273 | 0.911364 | 0.002273 | 0.002273 | ... | 0.002273 | 0.002273 | 0.002273 | 0.002273 | 0.002273 | 0.002273 | 0.002273 | 0.002273 | 0.002273 | 0.002273 | |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| the_social_network | 569 | 0.000735 | 0.000735 | 0.000735 | 0.000735 | 0.000735 | 0.000735 | 0.000735 | 0.000735 | 0.000735 | 0.000735 | ... | 0.000735 | 0.000735 | 0.000735 | 0.000735 | 0.000735 | 0.000735 | 0.000735 | 0.971324 | 0.000735 | 0.000735 |
| 572 | 0.001087 | 0.001087 | 0.001087 | 0.001087 | 0.001087 | 0.001087 | 0.001087 | 0.001087 | 0.001087 | 0.001087 | ... | 0.001087 | 0.001087 | 0.001087 | 0.001087 | 0.001087 | 0.001087 | 0.001087 | 0.001087 | 0.001087 | 0.001087 | |
| 573 | 0.000735 | 0.000735 | 0.000735 | 0.971324 | 0.000735 | 0.000735 | 0.000735 | 0.000735 | 0.000735 | 0.000735 | ... | 0.000735 | 0.000735 | 0.000735 | 0.000735 | 0.000735 | 0.000735 | 0.000735 | 0.000735 | 0.000735 | 0.000735 | |
| 574 | 0.000694 | 0.000694 | 0.000694 | 0.000694 | 0.000694 | 0.000694 | 0.000694 | 0.000694 | 0.000694 | 0.000694 | ... | 0.000694 | 0.000694 | 0.000694 | 0.000694 | 0.000694 | 0.000694 | 0.000694 | 0.000694 | 0.000694 | 0.000694 | |
| 575 | 0.003125 | 0.003125 | 0.003125 | 0.003125 | 0.003125 | 0.003125 | 0.003125 | 0.003125 | 0.003125 | 0.003125 | ... | 0.003125 | 0.003125 | 0.003125 | 0.003125 | 0.003125 | 0.003125 | 0.003125 | 0.003125 | 0.003125 | 0.003125 |
1846 rows × 40 columns
xxxxxxxxxximport matplotlib.pyplot as pltimport seaborn as sns# Prepare DataFramedf = THETA_LOADINGS.reset_index().copy()df['screen_play'] = df['screenplay_id'].map(DOCS['screen_play']) # ensure this maps properlyplt.figure(figsize=(12, 8)) # Approx 900x600 in pixels# Scatter plot with color by screen_playsns.scatterplot(data=df, x=2, y=3, hue='screenplay_id', palette='tab10', s=60, alpha=0.8)plt.xlabel("Component 02")plt.ylabel("Component 03")plt.title("THETA Loadings Colored by Screenplay")plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')plt.tight_layout()plt.show()df| screenplay_id | scene_id | 0 | 1 | 2 | 3 | 4 | screen_play | |
|---|---|---|---|---|---|---|---|---|
| 0 | joy | 1 | -0.002304 | -0.003213 | 0.005371 | 0.003894 | -0.006513 | NaN |
| 1 | joy | 2 | 0.002661 | -0.000563 | 0.000541 | 0.002004 | 0.013783 | NaN |
| 2 | joy | 4 | 0.009537 | 0.000668 | 0.000696 | -0.000457 | -0.002512 | NaN |
| 3 | joy | 6 | 0.009537 | 0.000668 | 0.000696 | -0.000457 | -0.002512 | NaN |
| 4 | joy | 7 | 0.001526 | 0.007347 | 0.002287 | 0.001414 | 0.001315 | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1841 | the_social_network | 569 | 0.003603 | -0.000761 | -0.004378 | -0.000921 | 0.009922 | NaN |
| 1842 | the_social_network | 572 | 0.002200 | -0.001178 | -0.001982 | 0.004720 | 0.001502 | NaN |
| 1843 | the_social_network | 573 | -0.002736 | -0.009356 | -0.000108 | -0.001901 | -0.002036 | NaN |
| 1844 | the_social_network | 574 | -0.004560 | 0.001273 | 0.000453 | -0.008940 | 0.001517 | NaN |
| 1845 | the_social_network | 575 | 0.003153 | 0.001361 | -0.001179 | 0.001994 | -0.000507 | NaN |
1846 rows × 8 columns
px.scatter(THETA_LOADINGS.reset_index(), 2, 3, # size=DOCS.n_chars, color=DOCS.screen_play, height=600, width=900)TPAIRS.to_csv(f"{output_dir}/{data_prefix}-TOPICPAIRS-{n_topics}.csv", index=True)LIB.to_csv(f"{output_dir}/{data_prefix}-LIB-KEY.csv", index=True)VOCAB.to_csv(f"{output_dir}/{data_prefix}-VOCAB2.csv", index=True)THETA_LOADINGS.to_csv(f"{output_dir}/{data_prefix}-THETA_LOADINGS.csv", index=True)THETA_COMPS.to_csv(f"{output_dir}/{data_prefix}-THETA_COMPS.csv", index=True)xxxxxxxxxxFinal Project Notebook¶
DS 5001 Text as Data | Spring 2025
xxxxxxxxxx# MetadataMetadata¶
- Full Name: Gabriella Cordelli
- Userid:
- GitHub Repo URL: https://github.com/GEMcordelli/Text-Analytics-Project-Digital-Analytical-Addition
- UVA Box URL:
xxxxxxxxxx# OverviewOverview¶
The goal of the final project is for you to create a digital analytical edition of a corpus using the tools, practices, and perspectives you’ve learning in this course. You will select a corpus that has already been digitized and transcribed, parse that into an F-compliant set of tables, and then generate and visualize the results of a series of fitted models. You will also draw some tentative conclusions regarding the linguistic, cultural, psychological, or historical features represented by your corpus. The point of the exercise is to have you work with a corpus through the entire pipeline from ingestion to interpretation.
Specifically, you will acquire a collection of long-form texts and perform the following operations:
- Convert the collection from their source formats (F0) into a set of tables that conform to the Standard Text Analytic Data Model (F2).
- Annotate these tables with statistical and linguistic features using NLP libraries such as NLTK (F3).
- Produce a vector representation of the corpus to generate TFIDF values to add to the TOKEN (aka CORPUS) and VOCAB tables (F4).
- Model the annotated and vectorized model with tables and features derived from the application of unsupervised methods, including PCA, LDA, and word2vec (F5).
- Explore your results using statistical and visual methods.
- Present conclusions about patterns observed in the corpus by means of these operations.
When you are finished, you will make the results of your work available in GitHub (for code) and UVA Box (for data). You will submit to Gradescope (via Canvas) a PDF version of a Jupyter notebook that contains the information listed below.
Some Details¶
- Please fill out your answers in each task below by editing the markdown cell.
- Replace text that asks you to insert something with the thing, i.e. replace
(INSERT IMAGE HERE)with an image element, e.g.. - For URLs, just paste the raw URL directly into the text area. Don't worry about providing link labels using
[label](link). - Please do not alter the structure of the document or cell, i.e. the bulleted lists.
- You may add explanatory paragraphs below the bulleted lists.
- Please name your tables as they are named in each task below.
- Tasks are indicated by headers with point values in parentheses.
xxxxxxxxxx# Raw DataRaw Data¶
xxxxxxxxxx## Source Description (1)Source Description (1)¶
xxxxxxxxxxProvide a brief description of your source material, including its provenance and content. Tell us where you found it and what kind of content it contains.Provide a brief description of your source material, including its provenance and content. Tell us where you found it and what kind of content it contains.
(INSERT DESCRIPTION HERE)
xxxxxxxxxx## Source Features (1)Source Features (1)¶
Add values for the following items. (Do this for all following bulleted lists.)
- Source URL:
- UVA Box URL:
- Number of raw documents:
- Total size of raw documents (e.g. in MB):
- File format(s), e.g. XML, plaintext, etc.:
xxxxxxxxxx## Source Document Structure (1)Source Document Structure (1)¶
Provide a brief description of the internal structure of each document. That, describe the typical elements found in document and their relation to each other. For example, a corpus of letters might be described as having a date, an addressee, a salutation, a set of content paragraphs, and closing. If they are various structures, state that.
(INSERT DESCRIPTION HERE)
xxxxxxxxxx# Parsed and Annotated DataParsed and Annotated Data¶
Parse the raw data into the three core tables of your addition: the LIB, CORPUS, and VOCAB tables.
These tables will be stored as CSV files with header rows.
You may consider using | as a delimitter.
Provide the following information for each.
xxxxxxxxxx## LIB (2)LIB (2)¶
The source documents the corpus comprises. These may be books, plays, newspaper articles, abstracts, blog posts, etc.
Note that these are not documents in the sense used to describe a bag-of-words representation of a text, e.g. chapter.
- UVA Box URL:
- GitHub URL for notebook used to create:
- Delimitter:
- Number of observations:
- List of features, including at least three that may be used for model summarization (e.g. date, author, etc.):
- Average length of each document in characters:
xxxxxxxxxx## CORPUS (2)CORPUS (2)¶
The sequence of word tokens in the corpus, indexed by their location in the corpus and document structures.
- UVA Box URL:
- GitHub URL for notebook used to create:
- Delimitter:
- Number of observations Between (should be >= 500,000 and <= 2,000,000 observations.):
- OHCO Structure (as delimitted column names):
- Columns (as delimitted column names, including
token_str,term_str,pos, andpos_group):
xxxxxxxxxx## VOCAB (2)VOCAB (2)¶
The unique word types (terms) in the corpus.
- UVA Box URL:
- GitHub URL for notebook used to create:
- Delimitter:
- Number of observations:
- Columns (as delimitted names, including
n,p',i,dfidf,porter_stem,max_posandmax_pos_group,stop): - Note: Your VOCAB may contain ngrams. If so, add a feature for
ngram_length. - List the top 20 significant words in the corpus by DFIDF.
(INSERT LIST HERE)
xxxxxxxxxx# Derived TablesDerived Tables¶
xxxxxxxxxx## BOW (3)BOW (3)¶
A bag-of-words representation of the CORPUS.
- UVA Box URL:
- GitHub URL for notebook used to create:
- Delimitter:
- Bag (expressed in terms of OHCO levels):
- Number of observations:
- Columns (as delimitted names, including
n,tfidf):
xxxxxxxxxx## DTM (3)DTM (3)¶
A represenation of the BOW as a sparse count matrix.
- UVA Box URL:
- UVA Box URL of BOW used to generate (if applicable):
- GitHub URL for notebook used to create:
- Delimitter:
- Bag (expressed in terms of OHCO levels):
xxxxxxxxxx## TFIDF (3)TFIDF (3)¶
A Document-Term matrix with TFIDF values.
- UVA Box URL:
- UVA Box URL of DTM or BOW used to create:
- GitHub URL for notebook used to create:
- Delimitter:
- Description of TFIDIF formula (
𝐿𝐴𝑇𝐸𝑋 OK):
xxxxxxxxxx## Reduced and Normalized TFIDF_L2 (3)Reduced and Normalized TFIDF_L2 (3)¶
xxxxxxxxxxA Document-Term matrix with L2 normalized TFIDF values.A Document-Term matrix with L2 normalized TFIDF values.
- UVA Box URL:
- UVA Box URL of source TFIDF table:
- GitHub URL for notebook used to create:
- Delimitter:
- Number of features (i.e. significant words):
- Principle of significant word selection:
xxxxxxxxxx# ModelsModels¶
xxxxxxxxxx## PCA Components (4)PCA Components (4)¶
- UVA Box URL:
- UVA Box URL of the source TFIDF_L2 table:
- GitHub URL for notebook used to create:
- Delimitter:
- Number of components: 10
- Library used to generate:
- Top 5 positive terms for first component: continued slots noó shown objection draw
- Top 5 negative terms for second component: is more s was are do have be
xxxxxxxxxx## PCA DCM (4)PCA DCM (4)¶
The document-component matrix generated.
- UVA Box URL:
- GitHub URL for notebook used to create:
- Delimitter:
xxxxxxxxxx## PCA Loadings (4)PCA Loadings (4)¶
The component-term matrix generated.
- UVA Box URL:
- GitHub URL for notebook used to create:
- Delimitter:
xxxxxxxxxx## PCA Visualization 1 (4)Include a scatterplot of documents in the space created by the first two components.Color the points based on a metadata feature associated with the documents.Also include a scatterplot of the loadings for the same two components. (This does not need a feature mapped onto color.)#### PCA First##### PCA Second Briefly describe the nature of the polarity you see in the first component:Again, we struggle to see the intricacies without plotly, however polarity is on full display in the graph, with 2 distict high and low clsuters for PC1, and a notable absence of PC0 across most films. The only exception here, is Steve Jobs, which appears to be uniquely inconsistent where most other films show a distinctive pattern. This is perhaps associated with the fact that "Steve Jobs" is the only documentary film on our list, and therefore exhibits patterns inconsistent with our scripted film counterparts. A documentary does not follow the same dialogue and narrative flow and a typical film.PCA Visualization 1 (4)¶
Include a scatterplot of documents in the space created by the first two components.
Color the points based on a metadata feature associated with the documents.
Also include a scatterplot of the loadings for the same two components. (This does not need a feature mapped onto color.)
PCA First¶
PCA Second¶
Briefly describe the nature of the polarity you see in the first component:
Again, we struggle to see the intricacies without plotly, however polarity is on full display in the graph, with 2 distict high and low clsuters for PC1, and a notable absence of PC0 across most films. The only exception here, is Steve Jobs, which appears to be uniquely inconsistent where most other films show a distinctive pattern. This is perhaps associated with the fact that "Steve Jobs" is the only documentary film on our list, and therefore exhibits patterns inconsistent with our scripted film counterparts. A documentary does not follow the same dialogue and narrative flow and a typical film.
xxxxxxxxxx## PCA Visualization 2 (4)Include a scatterplot of documents in the space created by the second two components.Color the points based on a metadata feature associated with the documents.Also include a scatterplot of the loadings for the same two components. (This does not need a feature mapped onto color.)Briefly describe the nature of the polarity you see in the second component:In this PCS visual we see a very different pattern than previous.Once again, likely for the same reasons as previously, Steve Jobs shows the least consistency across each PC. Here, to a far greater degree than before, indicating that these PCs likely do not ecompas any aspect of the film well. We also see an interesting Diagonal clustering towards the positive of PC 2. It seems The Social Network rests closer to the neutral point, and The Big Short is far positive for each. The diagonal gradient for the films in between those endpoints is interesting to see. This shows that which brings each entrepreneural film together, and that which teases them apart.xxxxxxxxxx## LDA TOPIC (4)LDA TOPIC (4)¶
- UVA Box URL:
- UVA Box URL of count matrix used to create:
- GitHub URL for notebook used to create:
- Delimitter:
- Libary used to compute: sklearn
- A description of any filtering, e.g. POS (Nouns and Verbs only): Nouns & Adjectives
- Number of components: 39
- Any other parameters used:
- Top 5 words and best-guess labels for topic five topics by mean document weight:
- T00: continued lot arches angles parking
- T01: bonds mortgage banks car mortgage
- T02: computer room moment door skeeter
- T03: yule skeeter os door map
- T04: girls kitchen door room skeeter
xxxxxxxxxx## LDA THETA (4)LDA THETA (4)¶
- UVA Box URL:
- GitHub URL for notebook used to create:
- Delimitter:
xxxxxxxxxx## LDA PHI (4)- UVA Box URL:- GitHub URL for notebook used to create:- Delimitter:LDA PHI (4)¶
- UVA Box URL:
- GitHub URL for notebook used to create:
- Delimitter:
xxxxxxxxxx## LDA + PCA Visualization (4)Apply PCA to the PHI table and plot the topics in the space opened by the first two components.Size the points based on the mean document weight of each topic (using the THETA table).Color the points basd on a metadata feature from the LIB table.Provide a brief interpretation of what you see.##### PHI with Theta scatter SizeBecause my Plotly was not rendering properly throughout the semester, I am not able to see the granular details of what is going on in the lower left corner of the graph. Here is where we would likely see most of the components intricacies lay. Even so, what this does tell us is that components 1 & 2 contian very subtle differences for the most part, while there is an interesting and glaring outlier polarity in loadings T27 & T19. I think this may be due to these loadings outputting fairly film specific categroies. For example, T27's vocabulary appears to largely pull from The Founder and generally has a commerical fast food bend: continued lot arches angles parking lot parking cover restaurant things.LDA + PCA Visualization (4)¶
Apply PCA to the PHI table and plot the topics in the space opened by the first two components.
Size the points based on the mean document weight of each topic (using the THETA table).
Color the points basd on a metadata feature from the LIB table.
Provide a brief interpretation of what you see.
PHI with Theta scatter Size¶
Because my Plotly was not rendering properly throughout the semester, I am not able to see the granular details of what is going on in the lower left corner of the graph. Here is where we would likely see most of the components intricacies lay. Even so, what this does tell us is that components 1 & 2 contian very subtle differences for the most part, while there is an interesting and glaring outlier polarity in loadings T27 & T19. I think this may be due to these loadings outputting fairly film specific categroies. For example, T27's vocabulary appears to largely pull from The Founder and generally has a commerical fast food bend: continued lot arches angles parking lot parking cover restaurant things.
xxxxxxxxxxSentiment VOCAB_SENT (4)¶
Sentiment values associated with a subset of the VOCAB from a curated sentiment lexicon.
- UVA Box URL:
- UVA Box URL for source lexicon: https://www.dropbox.com/scl/fo/kdrbta82xj975r7eaipni/AHjys-3CkEhLKVllLBSAZRs/lexicons?dl=0&preview=salex_nrc.csv&rlkey=ucswokipct8i2g0fxemosbx81&subfolder_nav_tracking=1
- GitHub URL for notebook used to create:
- Delimitter:
xxxxxxxxxx## Sentiment BOW_SENT (4)Sentiment BOW_SENT (4)¶
Sentiment values from VOCAB_SENT mapped onto BOW.
- UVA Box URL:
- GitHub URL for notebook used to create:
- Delimitter:
xxxxxxxxxx## Sentiment DOC_SENT (4)Sentiment DOC_SENT (4)¶
Computed sentiment per bag computed from BOW_SENT.
- UVA Box URL:
- GitHub URL for notebook used to create:
- Delimitter:
- Document bag expressed in terms of OHCO levels: Paragraphs
xxxxxxxxxx## Sentiment Plot (4)Sentiment Plot (4)¶
Plot sentiment over some metric space, such as time.
If you don't have a metric metadata features, plot sentiment over a feature of your choice.
You may use a bar chart or a line graph.
scene_id as a proxy for narrative time (in terms of desired layout, not chronology, because the film is not in order of events)¶
xxxxxxxxxx## VOCAB_W2V (4)VOCAB_W2V (4)¶
A table of word2vec features associated with terms in the VOCAB table.
- UVA Box URL:
- GitHub URL for notebook used to create:
- Delimitter: ,
- Document bag expressed in terms of OHCO levels: Paragraphs
- Number of features generated:
- The library used to generate the embeddings:Gensim
xxxxxxxxxx## Word2vec tSNE Plot (4)Word2vec tSNE Plot (4)¶
Plot word embedding featues in two-dimensions using t-SNE.
had to use seaborn and matplot because my plotly has been consistently ouputting no data all semester, even though the data frames I'm feeding it are correct¶
There is a small cluster in the (-10, 3) area of the graph that has an interesting theme of novelty & youth. Words such as "new, young, little, against & few" are indicative of the association between recounting stories of entrepreneurial innovation and depictions of innovation that rubs against the grain of society. Words like few and against make me think of the notion that traditional self-made American stories are framed as "breaking the mold" and thinking in manners that people may not agree with at first. For example, early in The Social Network, nay sayers argue that there are other platforms that already have more appeal that FaceBook every would, Steve Jobs is remembered for struggling early in his career for people to "see his vision". This phenomena is additinoally iften associated with youth; a fresh mind is a flexible mind. For example, in The Help, Emma Stone's character Skeeter is a new graduate when she sets out to write a revolutionary account of Black maids experience with their white bosses in a 1960s Jackson, Mississippi
xxxxxxxxxx# RiffsRiffs¶
Provde at least three visualizations that combine the preceding model data in interesting ways.
These should provide insight into how features in the LIB table are related.
The nature of this relationship is left open to you -- it may be correlation, or mutual information, or something less well defined.
In doing so, consider the following visualization types:
- Hierarchical cluster diagrams
- Heatmaps
- Scatter plots
- KDE plots
- Dispersion plots
- t-SNE plots
- etc.
xxxxxxxxxx## Riff 2 (5)Riff 2 (5)¶
The Social Network ~ Trust & Anger¶
I chose to look at patterns of trust and anger in The Soical Network because commodification of entrepreneur's in media values the dramatic fluxuations in trust and stability as an idea begins to gain traction, or when it is being doubted. In The Social Network, this pressure results in the deterioration of freindships. Interestingly, along the lines of the importance of relationships in The Social Network, trust and anger share a distinct trade off from the beginning to the end of the film, likely having to do with the story line of Mark Zuckerberg and Eduardo Saverin's close friendship in college & partnership in the brand, to the moment of depicted betrayal as Eduardo is phased out by mark and his business conultants
xxxxxxxxxx## Riff 1 (5)Riff 1 (5)¶
Whole Dataset Hierarchical Cluster Based On Word To Vec ~ Screenplay title as Metadata¶
There appears to be a lot of narrow, long distances towards the top of the hierarchical cluster. While the LIB metadata was not directly used in this graph, its outputs tell us something about the nature of the data form document to document. I say this because throughout the clusters, there are term strings comprised mostly of character names. Here, and at other points through my analysis, this is particularly visible with The Help. It appears that by addressing an individual by character name is particularly prevalnet for this film. One theory as to why this may be is because. The Help takes place in the South in the 1960s, and is largely centered around women who are stay at home mothers. In such an environement and time period, where decorum is valued and and social networks are tight knit, characters may be more inclined to call each other formally by name. The Help has two distictive social groups throughout the plot: the white housewives and their black maids. To one another, the housewives value formality, so they often refer to eachother by first name, or Miss perhaps. The Maids are employees of the housewives, and therefore almost never refer to the bosses informally. Alternatively, their bosses almost exclusively refer to their maids informally, using first names or nicknames. There are also quite a few nicknames in the cluster, which I believe was on the choice of the writers to enhance the feeling that this circle of friends was well-established, and had ties to one another all the way from childhood (for example, Emma Stone's character going by Skeeter).
xxxxxxxxxx## Riff 3 (5)Riff 3 (5)¶
Joy¶
Joy-Sadness Trade-Off for The Social Network¶
I chose to compare the senitments of The Social Network and Joy because they are both stories about an entrepreneur's journey to success. While the character Joy in the film Joy was more of a genuine independent, I thought it would be interesting to compare this story with Jesse Eisenberg's portrayal of Mark Zuckerberg because in the film Mark falls in and out of many relationships, but maintains an internal, stubborn independence. Joy and sandess remain far more consistenet in The Social Network, whereas in the film Joy, there is a large spike in at the end of the first third of the movie. Joy in this film alterniatively maintains a rather consistent sentiment trend, whereas The Social Network has a far more tempramental curve. This, again could have to do with the series of relationship turmoils. In both cases, we see a general downward trend in sadness over the course of the film. In the film Joy, there is an inverses relationship between sadness and joy towards the films end, whereas in The Social Netwrok, they grow together. This could have to do with the differences in how each film elapsed. Both films involve a legal case that is resolved at the films climax, but in Joy there is a more positive connotation with ther ability to patten her mop invention, because it was a long standing struggle which she overcome. Hoever, in The Social Network, there is an air of accomplishment that overlays a larger gloom because the lawsuit settled out of court, money was owed to those who filed against FaceBook and friendships fell apart, but Mark Zuckerberg still maintianed an elevated lifestyle and was generally considered the prodigy brainchild of the company.
xxxxxxxxxx# Interpretation (4)Interpretation (4)¶
Describe something interesting about your corpus that you discovered during the process of completing this assignment.
At a minumum, use 250 words, but you may use more. You may also add images if you'd like.
(INSERT INTERPRETATION HERE)
import pandas as pdimport numpy as npfrom scipy.linalg import normimport plotly_express as pximport seaborn as snssns.set(style='ticks')import configparserconfig = configparser.ConfigParser()config.read("env.ini")data_home = config['DEFAULT']['data_home']output_dir = config['DEFAULT']['output_dir']local_lib = config['DEFAULT']['local_lib']data_prefix = 'entrepreneur'OHCO = ['screenplay_id', 'scene_id']colors = "YlGnBu"LIB = pd.read_csv(f'{output_dir}/ entrepreneur-LIB.csv').set_index('screenplay_id')VOCAB = pd.read_csv(f'{output_dir}/{data_prefix}-VOCAB-PARAS.csv').set_index('term_str')BOW = pd.read_csv(f'{output_dir}/{data_prefix}-BOW-PARAS.csv').set_index(OHCO+['term_str'])VOCAB[['n','p','i']].head(20)| n | p | i | |
|---|---|---|---|
| term_str | |||
| the | 6340 | 0.042129 | 4.569051 |
| a | 4009 | 0.026639 | 5.230291 |
| to | 3642 | 0.024201 | 5.368802 |
| and | 2999 | 0.019928 | 5.649052 |
| you | 2737 | 0.018187 | 5.780938 |
| i | 2271 | 0.015091 | 6.050206 |
| of | 2159 | 0.014346 | 6.123170 |
| in | 1841 | 0.012233 | 6.353044 |
| it | 1585 | 0.010532 | 6.569051 |
| kroc | 1501 | 0.009974 | 6.647609 |
| on | 1399 | 0.009296 | 6.749137 |
| is | 1398 | 0.009290 | 6.750169 |
| steve | 1346 | 0.008944 | 6.804855 |
| that | 1074 | 0.007137 | 7.130539 |
| mark | 962 | 0.006392 | 7.289425 |
| with | 903 | 0.006000 | 7.380736 |
| ray | 889 | 0.005907 | 7.403278 |
| we | 888 | 0.005901 | 7.404902 |
| at | 876 | 0.005821 | 7.424531 |
| he | 850 | 0.005648 | 7.467999 |
BOW| para_num | n | tf | tfidf | |||
|---|---|---|---|---|---|---|
| screenplay_id | scene_id | term_str | ||||
| joy | 1 | a | 0 | 1 | 0.090909 | 0.134575 |
| drive | 0 | 1 | 0.090909 | 0.603804 | ||
| in | 0 | 1 | 0.090909 | 0.177906 | ||
| its | 0 | 1 | 0.090909 | 0.302422 | ||
| kitchen | 0 | 1 | 0.090909 | 0.525700 | ||
| ... | ... | ... | ... | ... | ... | ... |
| the_social_network | 575 | waits | 1 | 2 | 0.040000 | 0.330430 |
| we | 1 | 1 | 0.020000 | 0.058725 | ||
| world | 1 | 1 | 0.020000 | 0.125735 | ||
| youngest | 1 | 1 | 0.020000 | 0.241362 | ||
| zuckerberg | 1 | 1 | 0.020000 | 0.144203 |
109759 rows × 4 columns
BOW_reduced = BOW.groupby(['screenplay_id', 'scene_id', 'term_str'])['tfidf'].sum().unstack(fill_value=0)TFIDF = BOW_reducedpos_set = ['NN', 'VB']VOCAB['dfidf'] = VOCAB['df'] * VOCAB['idf']VSHORT = VOCAB[VOCAB.max_pos_group.isin(['NN', 'VB', 'JJ']) & ~VOCAB.max_pos.isin(['NNP'])].sort_values('dfidf', ascending=False).head(5000)TFIDF = TFIDF[VSHORT.index]VOCAB| term_rank | n | n_chars | p | i | max_pos | max_pos_group | n_pos_group | cat_pos_group | n_pos | cat_pos | stop | term_rank2 | zipf_k | zipf_k2 | log_r | df | idf | dfidf | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| term_str | |||||||||||||||||||
| the | 1 | 6340 | 3 | 0.042129 | 4.569051 | DT | DT | DT | {'DT', 'JJ', 'NN', 'VB'} | 5 | {'DT', 'NN', 'VBP', 'NNP', 'JJ'} | 1 | 1 | 6340 | 6340 | 0.000000 | 1936 | 1.149243 | 2224.934910 |
| a | 2 | 4009 | 1 | 0.026639 | 5.230291 | DT | DT | DT | {'DT', 'JJ', 'NN'} | 5 | {'DT', 'NN', 'NNS', 'NNP', 'JJ'} | 1 | 2 | 8018 | 8018 | 1.000000 | 1539 | 1.480329 | 2278.226269 |
| to | 3 | 3642 | 2 | 0.024201 | 5.368802 | TO | TO | TO | {'NN', 'JJ', 'TO', 'RP', 'IN', 'VB'} | 12 | {'NN', 'NNS', 'VBZ', 'VBP', 'NNP', 'JJ', 'TO',... | 1 | 3 | 10926 | 10926 | 1.584963 | 1608 | 1.417055 | 2278.624094 |
| and | 4 | 2999 | 3 | 0.019928 | 5.649052 | CC | CC | CC | {'NN', 'RB', 'CC', 'IN', 'VB'} | 6 | {'NN', 'VBP', 'NNP', 'RB', 'CC', 'IN'} | 1 | 4 | 11996 | 11996 | 2.000000 | 1459 | 1.557342 | 2272.162427 |
| you | 5 | 2737 | 3 | 0.018187 | 5.780938 | PRP | PR | PR | {'NN', 'RB', 'JJ', 'PD', 'IN', 'CD', 'VB', 'PR'} | 15 | {'JJR', 'NN', 'NNS', 'VBZ', 'VBP', 'NNP', 'RB'... | 1 | 5 | 13685 | 13685 | 2.321928 | 1109 | 1.953063 | 2165.946674 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| jd | 10396 | 1 | 2 | 0.000007 | 17.199318 | NNP | NN | NN | {'NN'} | 1 | {'NNP'} | 0 | 259 | 10396 | 259 | 13.343741 | 1 | 12.068106 | 12.068106 |
| jaredó | 10397 | 1 | 6 | 0.000007 | 17.199318 | NNP | NN | NN | {'NN'} | 1 | {'NNP'} | 0 | 259 | 10397 | 259 | 13.343880 | 1 | 12.068106 | 12.068106 |
| jar | 10398 | 1 | 3 | 0.000007 | 17.199318 | NN | NN | NN | {'NN'} | 1 | {'NN'} | 0 | 259 | 10398 | 259 | 13.344018 | 1 | 12.068106 | 12.068106 |
| jammed | 10399 | 1 | 6 | 0.000007 | 17.199318 | JJ | JJ | JJ | {'JJ'} | 1 | {'JJ'} | 0 | 259 | 10399 | 259 | 13.344157 | 1 | 12.068106 | 12.068106 |
| flwhy | 10400 | 1 | 4 | 0.000007 | 17.199318 | NNP | NN | NN | {'NN'} | 1 | {'NNP'} | 0 | 259 | 10400 | 259 | 13.344296 | 1 | 12.068106 | 12.068106 |
10400 rows × 19 columns
xxxxxxxxxxAdding Some Labels
genre_csv = """joy, comdey/drama, 2015the_founder, gothic, 2016the_social_network, drama/historical_fiction, 2009steve_jobs, drama/history, 2015the_help, drama/historical_fiction, 2011the_big_short, comedy/thriller, 2015""".split('\n')[1:-1]genre = pd.DataFrame([line.split(', ') for line in genre_csv], columns=['screenplay_id','genre', 'year'])genre.book_id = genre.screenplay_idgenre = genre.set_index('screenplay_id')/tmp/ipykernel_570211/1656454425.py:10: UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access genre.book_id = genre.screenplay_id
LIB = pd.concat([LIB, genre], axis=1)xxxxxxxxxxLIB['title'] = LIB['raw_title']#LIB = LIB.drop(['raw_title'])LIB_COLS = ['title', 'genre', 'year']#LIB=LIB.drop(['source_file_path', 'scene_regex'])LIB[LIB_COLS].head()LIB| source_file_path | raw_title | scene_regex | genre | year | title | |
|---|---|---|---|---|---|---|
| screenplay_id | ||||||
| joy | /sfs/weka/scratch/gec2tp/data/entrepreneur/Joy... | Joy | ^(INT\.|EXT\.|INT/EXT\.|EXT/INT\.|SCENE\s+\d{1... | comdey/drama | 2015 | Joy |
| steve_jobs | /sfs/weka/scratch/gec2tp/data/entrepreneur/Ste... | Steve Jobs | ^(INT\.|EXT\.|INT/EXT\.|EXT/INT\.|SCENE\s+\d{1... | drama/history | 2015 | Steve Jobs |
| the_big_short | /sfs/weka/scratch/gec2tp/data/entrepreneur/The... | The Big Short | ^(INT\.|EXT\.|INT/EXT\.|EXT/INT\.|SCENE\s+\d{1... | comedy/thriller | 2015 | The Big Short |
| the_founder | /sfs/weka/scratch/gec2tp/data/entrepreneur/The... | The Founder | ^(INT\.|EXT\.|INT/EXT\.|EXT/INT\.|SCENE\s+\d{1... | gothic | 2016 | The Founder |
| the_help | /sfs/weka/scratch/gec2tp/data/entrepreneur/The... | The Help | ^(INT\.|EXT\.|INT/EXT\.|EXT/INT\.|SCENE\s+\d{1... | drama/historical_fiction | 2011 | The Help |
| the_social_network | /sfs/weka/scratch/gec2tp/data/entrepreneur/The... | The Social Network | ^(INT\.|EXT\.|INT/EXT\.|EXT/INT\.|SCENE\s+\d{1... | drama/historical_fiction | 2009 | The Social Network |
xxxxxxxxxxPCA¶
TFIDF_L2 = (TFIDF.T / norm(TFIDF, 2, axis=1)).TTFIDF_L2| term_str | is | are | have | be | do | know | was | get | dont | were | ... | amble | angers | amfl | andthen | andmr | andi | anchoring | ample | appreciates | appreciative | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| screenplay_id | scene_id | |||||||||||||||||||||
| joy | 1 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 2 | 0.000000 | 0.067258 | 0.035151 | 0.071038 | 0.107690 | 0.076506 | 0.000000 | 0.000000 | 0.040008 | 0.000000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 4 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 6 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 7 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| the_social_network | 570 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 572 | 0.126864 | 0.057698 | 0.000000 | 0.000000 | 0.000000 | 0.065632 | 0.132038 | 0.068215 | 0.000000 | 0.072352 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 573 | 0.000000 | 0.000000 | 0.056207 | 0.000000 | 0.000000 | 0.000000 | 0.061529 | 0.063575 | 0.000000 | 0.000000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 574 | 0.029322 | 0.000000 | 0.000000 | 0.084511 | 0.042705 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 575 | 0.137268 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
2417 rows × 5000 columns
a = len(TFIDF_L2)TFIDF_L2 = TFIDF_L2.dropna()b = len(TFIDF_L2)bag_loss = a - bbag_loss549
TFIDF_L2| term_str | is | are | have | be | do | know | was | get | dont | were | ... | amble | angers | amfl | andthen | andmr | andi | anchoring | ample | appreciates | appreciative | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| screenplay_id | scene_id | |||||||||||||||||||||
| joy | 1 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 2 | 0.000000 | 0.067258 | 0.035151 | 0.071038 | 0.107690 | 0.076506 | 0.000000 | 0.000000 | 0.040008 | 0.000000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 4 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 6 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 7 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| the_social_network | 569 | 0.114021 | 0.000000 | 0.000000 | 0.109544 | 0.110709 | 0.117976 | 0.059336 | 0.061310 | 0.061695 | 0.000000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 572 | 0.126864 | 0.057698 | 0.000000 | 0.000000 | 0.000000 | 0.065632 | 0.132038 | 0.068215 | 0.000000 | 0.072352 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 573 | 0.000000 | 0.000000 | 0.056207 | 0.000000 | 0.000000 | 0.000000 | 0.061529 | 0.063575 | 0.000000 | 0.000000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 574 | 0.029322 | 0.000000 | 0.000000 | 0.084511 | 0.042705 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 575 | 0.137268 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1868 rows × 5000 columns
COV = TFIDF_L2.cov() # This also centers the vectorsfCOV.head()| term_str | is | are | have | be | do | know | was | get | dont | were | ... | amble | angers | amfl | andthen | andmr | andi | anchoring | ample | appreciates | appreciative |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| term_str | |||||||||||||||||||||
| is | 0.002642 | 0.000293 | 0.000216 | 0.000154 | 0.000074 | 0.000178 | -0.000003 | 0.000062 | 0.000051 | 0.000086 | ... | -8.438301e-07 | 0.000004 | 0.000012 | 0.000022 | 0.000002 | -0.000006 | -6.896139e-07 | -0.000006 | -0.000003 | 0.000001 |
| are | 0.000293 | 0.001852 | 0.000193 | 0.000128 | 0.000100 | 0.000089 | 0.000026 | 0.000192 | 0.000126 | 0.000019 | ... | -4.966656e-07 | -0.000001 | -0.000002 | -0.000003 | 0.000002 | -0.000003 | -9.298732e-07 | -0.000004 | 0.000004 | 0.000012 |
| have | 0.000216 | 0.000193 | 0.001570 | 0.000210 | 0.000266 | 0.000172 | 0.000229 | 0.000086 | 0.000283 | 0.000126 | ... | 5.719707e-07 | -0.000001 | 0.000003 | -0.000003 | 0.000002 | 0.000013 | -7.939805e-07 | -0.000003 | 0.000004 | -0.000002 |
| be | 0.000154 | 0.000128 | 0.000210 | 0.001928 | 0.000179 | 0.000185 | 0.000178 | 0.000020 | 0.000193 | 0.000086 | ... | -4.186387e-07 | 0.000002 | -0.000001 | -0.000002 | -0.000001 | 0.000005 | -7.837887e-07 | -0.000003 | -0.000002 | -0.000002 |
| do | 0.000074 | 0.000100 | 0.000266 | 0.000179 | 0.002052 | 0.000334 | 0.000255 | 0.000271 | 0.000291 | 0.000040 | ... | -4.500072e-07 | 0.000002 | -0.000002 | -0.000003 | 0.000002 | 0.000005 | -8.425179e-07 | -0.000003 | 0.000004 | -0.000002 |
5 rows × 5000 columns
from scipy.linalg import eigheig_vals, eig_vecs = eigh(COV)EIG_VEC = pd.DataFrame(eig_vecs, index=COV.index, columns=COV.index)EIG_VAL = pd.DataFrame(eig_vals, index=COV.index, columns=['eig_val'])EIG_VAL.index.name = 'term_str'EIG_VEC.iloc[:10, :10].style.background_gradient(cmap=colors)| term_str | is | are | have | be | do | know | was | get | dont | were |
|---|---|---|---|---|---|---|---|---|---|---|
| term_str | ||||||||||
| is | -0.008740 | 0.014468 | -0.010531 | 0.023358 | -0.019501 | 0.047186 | -0.058522 | 0.059552 | 0.051914 | 0.059551 |
| are | 0.001514 | 0.018318 | -0.013984 | 0.051993 | -0.038404 | 0.103738 | -0.106298 | 0.064809 | 0.054640 | 0.005299 |
| have | 0.000613 | 0.001905 | 0.000246 | -0.015984 | 0.023679 | -0.040493 | 0.056038 | -0.027658 | 0.007505 | 0.045481 |
| be | -0.001078 | -0.000148 | 0.001307 | 0.001813 | -0.000114 | 0.001185 | -0.001046 | -0.000276 | 0.002085 | -0.001345 |
| do | -0.006433 | -0.004145 | -0.007857 | 0.015346 | 0.004382 | 0.023060 | -0.011600 | 0.009771 | 0.012486 | -0.008574 |
| know | -0.003377 | 0.006185 | 0.003028 | 0.018620 | -0.018843 | 0.027605 | -0.019937 | 0.006014 | 0.009631 | -0.022318 |
| was | -0.005365 | 0.017261 | -0.001275 | 0.018683 | -0.002109 | 0.016016 | -0.032710 | 0.016248 | 0.029140 | 0.023536 |
| get | 0.008093 | -0.013423 | -0.006766 | -0.016630 | 0.005738 | -0.015421 | 0.001010 | 0.030972 | 0.028472 | 0.008850 |
| dont | -0.009942 | 0.010504 | -0.012400 | -0.005770 | 0.013332 | 0.009641 | -0.001330 | 0.008068 | -0.006390 | 0.000242 |
| were | 0.002320 | 0.006897 | -0.006384 | 0.005637 | 0.001236 | -0.002112 | -0.012179 | -0.001382 | -0.008316 | -0.007144 |
EIG_VEC_PAIRS = EIG_VEC.stack().sort_values(ascending=False).to_frame('covariance')EIG_VEC_PAIRS.index.names = ['term1', 'term2']EIG_VEC_PAIRS.head(20)| covariance | ||
|---|---|---|
| term1 | term2 | |
| more | appreciates | 0.993390 |
| continued | appreciative | 0.991010 |
| tennis | genuine | 0.608465 |
| os | andi | 0.559930 |
| phone | andi | 0.475615 |
| office | andthen | 0.452633 |
| bomb | geek | 0.434562 |
| s | anchoring | 0.424377 |
| title | antagonize | 0.418314 |
| fries | teammates | 0.403636 |
| hotter | girlish | 0.402058 |
| loan | flattery | 0.391117 |
| is | andmr | 0.381882 |
| ticket | apples | 0.372375 |
| seats | fixes | 0.362034 |
| truck | have | 0.350836 |
| statue | irritated | 0.348182 |
| office | amfl | 0.345516 |
| right | amble | 0.337294 |
| zero | swear | 0.337101 |
EIG_VEC_PAIRS.sample(10000).sort_values('covariance', ascending=False).plot(rot=45, style='.', figsize=(10,5));xxxxxxxxxxSelect Principal Components¶
EIG_PAIRS = EIG_VAL.join(EIG_VEC.T)EIG_PAIRS.sort_values('eig_val', ascending=False).head(10)| eig_val | is | are | have | be | do | know | was | get | dont | ... | amble | angers | amfl | andthen | andmr | andi | anchoring | ample | appreciates | appreciative | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| term_str | |||||||||||||||||||||
| appreciative | 0.083012 | -0.036719 | -0.021617 | -0.018914 | -0.018489 | -0.019728 | -0.015675 | -0.022285 | -0.016027 | -0.015902 | ... | -0.000027 | -0.000104 | -0.000109 | -0.000170 | -0.000088 | -0.000215 | -0.000053 | -0.000196 | -0.000149 | -0.000148 |
| appreciates | 0.018867 | -0.032156 | -0.012144 | -0.003615 | 0.000054 | -0.010020 | -0.002721 | -0.010583 | -0.007981 | -0.004658 | ... | -0.000043 | 0.000114 | -0.000175 | -0.000299 | -0.000123 | -0.000317 | -0.000090 | -0.000337 | -0.000207 | -0.000253 |
| ample | 0.008506 | -0.081056 | -0.066555 | -0.120606 | -0.101684 | -0.153571 | -0.154922 | -0.199971 | -0.067697 | -0.039216 | ... | 0.000099 | -0.002738 | -0.000473 | 0.000843 | -0.000113 | -0.000673 | 0.000212 | 0.001564 | -0.003335 | -0.000869 |
| anchoring | 0.005048 | -0.098747 | -0.089643 | -0.140463 | -0.150119 | -0.193075 | -0.157450 | -0.276807 | -0.093192 | -0.255633 | ... | -0.000189 | 0.001625 | -0.000636 | 0.002254 | -0.001338 | -0.002328 | 0.000062 | 0.002040 | 0.000097 | -0.002933 |
| andi | 0.004505 | 0.315112 | 0.133692 | 0.031232 | 0.015433 | 0.022448 | -0.011864 | -0.152727 | -0.018499 | -0.023347 | ... | -0.000175 | 0.000442 | 0.000435 | 0.001667 | 0.000240 | -0.002409 | 0.000673 | -0.001708 | 0.000011 | -0.000950 |
| andmr | 0.003949 | 0.381882 | 0.140767 | 0.041358 | 0.039815 | -0.082378 | 0.025372 | -0.257427 | 0.016483 | -0.016768 | ... | 0.000014 | 0.000324 | 0.001384 | 0.004030 | 0.000604 | -0.001966 | 0.000379 | 0.000199 | 0.001144 | 0.001868 |
| andthen | 0.003755 | -0.166583 | -0.187275 | -0.073125 | -0.076670 | -0.014912 | 0.014580 | 0.284799 | -0.041987 | -0.093919 | ... | -0.000050 | 0.000170 | -0.000922 | -0.000850 | 0.000279 | -0.003045 | 0.000749 | -0.002401 | -0.000189 | 0.002314 |
| amfl | 0.003695 | 0.034432 | -0.015104 | 0.034329 | -0.011472 | 0.082615 | 0.083550 | -0.028582 | 0.037542 | 0.092059 | ... | -0.000008 | -0.000310 | 0.002278 | 0.002880 | -0.000675 | 0.005466 | 0.000391 | 0.001184 | 0.002373 | -0.003623 |
| angers | 0.003542 | 0.094634 | -0.065145 | -0.030695 | 0.019852 | -0.066882 | -0.031029 | 0.084035 | -0.060636 | 0.041990 | ... | 0.000219 | 0.000049 | 0.000115 | 0.003336 | 0.000495 | 0.000302 | -0.000106 | 0.001895 | 0.000641 | 0.000978 |
| amble | 0.003458 | -0.019989 | 0.062340 | 0.028161 | 0.030435 | 0.030439 | 0.040775 | -0.018104 | -0.004505 | 0.026517 | ... | -0.000106 | 0.000272 | 0.001943 | -0.001266 | 0.000312 | 0.002206 | -0.000888 | -0.001092 | -0.001645 | 0.003131 |
10 rows × 5001 columns
EIG_PAIRS['exp_var'] = np.round((EIG_PAIRS.eig_val / EIG_PAIRS.eig_val.sum()) * 100, 2)EIG_PAIRS.exp_var.sort_values(ascending=False).head().plot.bar(rot=45);xxxxxxxxxxPicking Top K Components¶
COMPS = EIG_PAIRS.sort_values('exp_var', ascending=False).head(10).reset_index(drop=True)COMPS.index.name = 'comp_id'COMPS.index = ["PC{}".format(i) for i in COMPS.index.tolist()]COMPS.index.name = 'pc_id'COMPS| eig_val | is | are | have | be | do | know | was | get | dont | ... | angers | amfl | andthen | andmr | andi | anchoring | ample | appreciates | appreciative | exp_var | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| pc_id | |||||||||||||||||||||
| PC0 | 0.083012 | -0.036719 | -0.021617 | -0.018914 | -0.018489 | -0.019728 | -0.015675 | -0.022285 | -0.016027 | -0.015902 | ... | -0.000104 | -0.000109 | -0.000170 | -0.000088 | -0.000215 | -0.000053 | -0.000196 | -0.000149 | -0.000148 | 8.48 |
| PC1 | 0.018867 | -0.032156 | -0.012144 | -0.003615 | 0.000054 | -0.010020 | -0.002721 | -0.010583 | -0.007981 | -0.004658 | ... | 0.000114 | -0.000175 | -0.000299 | -0.000123 | -0.000317 | -0.000090 | -0.000337 | -0.000207 | -0.000253 | 1.93 |
| PC2 | 0.008506 | -0.081056 | -0.066555 | -0.120606 | -0.101684 | -0.153571 | -0.154922 | -0.199971 | -0.067697 | -0.039216 | ... | -0.002738 | -0.000473 | 0.000843 | -0.000113 | -0.000673 | 0.000212 | 0.001564 | -0.003335 | -0.000869 | 0.87 |
| PC3 | 0.005048 | -0.098747 | -0.089643 | -0.140463 | -0.150119 | -0.193075 | -0.157450 | -0.276807 | -0.093192 | -0.255633 | ... | 0.001625 | -0.000636 | 0.002254 | -0.001338 | -0.002328 | 0.000062 | 0.002040 | 0.000097 | -0.002933 | 0.52 |
| PC4 | 0.004505 | 0.315112 | 0.133692 | 0.031232 | 0.015433 | 0.022448 | -0.011864 | -0.152727 | -0.018499 | -0.023347 | ... | 0.000442 | 0.000435 | 0.001667 | 0.000240 | -0.002409 | 0.000673 | -0.001708 | 0.000011 | -0.000950 | 0.46 |
| PC5 | 0.003949 | 0.381882 | 0.140767 | 0.041358 | 0.039815 | -0.082378 | 0.025372 | -0.257427 | 0.016483 | -0.016768 | ... | 0.000324 | 0.001384 | 0.004030 | 0.000604 | -0.001966 | 0.000379 | 0.000199 | 0.001144 | 0.001868 | 0.40 |
| PC6 | 0.003755 | -0.166583 | -0.187275 | -0.073125 | -0.076670 | -0.014912 | 0.014580 | 0.284799 | -0.041987 | -0.093919 | ... | 0.000170 | -0.000922 | -0.000850 | 0.000279 | -0.003045 | 0.000749 | -0.002401 | -0.000189 | 0.002314 | 0.38 |
| PC7 | 0.003695 | 0.034432 | -0.015104 | 0.034329 | -0.011472 | 0.082615 | 0.083550 | -0.028582 | 0.037542 | 0.092059 | ... | -0.000310 | 0.002278 | 0.002880 | -0.000675 | 0.005466 | 0.000391 | 0.001184 | 0.002373 | -0.003623 | 0.38 |
| PC8 | 0.003542 | 0.094634 | -0.065145 | -0.030695 | 0.019852 | -0.066882 | -0.031029 | 0.084035 | -0.060636 | 0.041990 | ... | 0.000049 | 0.000115 | 0.003336 | 0.000495 | 0.000302 | -0.000106 | 0.001895 | 0.000641 | 0.000978 | 0.36 |
| PC9 | 0.003458 | -0.019989 | 0.062340 | 0.028161 | 0.030435 | 0.030439 | 0.040775 | -0.018104 | -0.004505 | 0.026517 | ... | 0.000272 | 0.001943 | -0.001266 | 0.000312 | 0.002206 | -0.000888 | -0.001092 | -0.001645 | 0.003131 | 0.35 |
10 rows × 5002 columns
x
# LoadingsLOADINGS = COMPS[COV.index].TLOADINGS.index.name = 'term_str'LOADINGS.head(10).style.background_gradient(cmap=colors)| pc_id | PC0 | PC1 | PC2 | PC3 | PC4 | PC5 | PC6 | PC7 | PC8 | PC9 |
|---|---|---|---|---|---|---|---|---|---|---|
| term_str | ||||||||||
| is | -0.036719 | -0.032156 | -0.081056 | -0.098747 | 0.315112 | 0.381882 | -0.166583 | 0.034432 | 0.094634 | -0.019989 |
| are | -0.021617 | -0.012144 | -0.066555 | -0.089643 | 0.133692 | 0.140767 | -0.187275 | -0.015104 | -0.065145 | 0.062340 |
| have | -0.018914 | -0.003615 | -0.120606 | -0.140463 | 0.031232 | 0.041358 | -0.073125 | 0.034329 | -0.030695 | 0.028161 |
| be | -0.018489 | 0.000054 | -0.101684 | -0.150119 | 0.015433 | 0.039815 | -0.076670 | -0.011472 | 0.019852 | 0.030435 |
| do | -0.019728 | -0.010020 | -0.153571 | -0.193075 | 0.022448 | -0.082378 | -0.014912 | 0.082615 | -0.066882 | 0.030439 |
| know | -0.015675 | -0.002721 | -0.154922 | -0.157450 | -0.011864 | 0.025372 | 0.014580 | 0.083550 | -0.031029 | 0.040775 |
| was | -0.022285 | -0.010583 | -0.199971 | -0.276807 | -0.152727 | -0.257427 | 0.284799 | -0.028582 | 0.084035 | -0.018104 |
| get | -0.016027 | -0.007981 | -0.067697 | -0.093192 | -0.018499 | 0.016483 | -0.041987 | 0.037542 | -0.060636 | -0.004505 |
| dont | -0.015902 | -0.004658 | -0.039216 | -0.255633 | -0.023347 | -0.016768 | -0.093919 | 0.092059 | 0.041990 | 0.026517 |
| were | -0.014546 | -0.010964 | -0.045979 | -0.177439 | -0.054759 | 0.011618 | -0.022318 | -0.045951 | -0.029490 | 0.044143 |
top_terms = []for i in range(10): for j in [0, 1]: comp_str = ' '.join(LOADINGS.sort_values(f'PC{i}', ascending=bool(j)).head(10).index.to_list()) top_terms.append((f"PC{i}", j, comp_str))COMP_GLOSS = pd.DataFrame(top_terms).set_index([0,1]).unstack()COMP_GLOSS.index.name = 'comp_id'COMP_GLOSS.columns = COMP_GLOSS.columns.droplevel(0) COMP_GLOSS = COMP_GLOSS.rename(columns={0:'pos', 1:'neg'})COMP_GLOSS| 1 | pos | neg |
|---|---|---|
| comp_id | ||
| PC0 | continued slots noó shown objection draw misap... | is more s was are do have be t get |
| PC1 | more continued t lucky years twenty liked s co... | is os phone are office room front were was car |
| PC2 | car phone office os vo front kitchen home driv... | s t don was re m know do have gonna |
| PC3 | s t car house door re don sits waiting front | was dont do were know be have beat did had |
| PC4 | os phone is are cheerleaders rings map s stand... | was car vo did were food nods pulls said day |
| PC5 | is office computer are door vo day students ri... | os was did cheerleaders t do car watch drive k... |
| PC6 | office phone was day students did had same sai... | are is car looks map standing dont watch line ... |
| PC7 | office day time car home door front watch look... | students computer right left male news bunch s... |
| PC8 | looks is map was beat students right door loan... | vo phone car os people customers do are get voice |
| PC9 | right office students car watch computer os ch... | phone news hear house door did title anything ... |
xxxxxxxxxxDCM¶
DCM = TFIDF_L2.dot(COMPS[COV.index].T) DCM = DCM.join(LIB[LIB_COLS], on='screenplay_id')DCM['doc'] = DCM.apply(lambda x: f"{x.title} {str(x.name[1]).zfill(2)}", 1)DCM.docscreenplay_id scene_id
joy 1 Joy 01
2 Joy 02
4 Joy 04
6 Joy 06
7 Joy 07
...
the_social_network 569 The Social Network 569
572 The Social Network 572
573 The Social Network 573
574 The Social Network 574
575 The Social Network 575
Name: doc, Length: 1868, dtype: objectDCM.head()| PC0 | PC1 | PC2 | PC3 | PC4 | PC5 | PC6 | PC7 | PC8 | PC9 | title | genre | year | doc | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| screenplay_id | scene_id | ||||||||||||||
| joy | 1 | -0.009742 | -0.008930 | 0.057842 | 0.044163 | -0.012808 | -0.088922 | -0.105269 | 0.061762 | 0.075649 | 0.080738 | Joy | comdey/drama | 2015 | Joy 01 |
| 2 | -0.026036 | 0.035024 | -0.077992 | -0.172449 | 0.015656 | -0.044006 | -0.114761 | 0.045198 | 0.023700 | 0.105914 | Joy | comdey/drama | 2015 | Joy 02 | |
| 4 | -0.005343 | -0.007115 | 0.026311 | 0.068166 | -0.069754 | -0.057152 | -0.063562 | 0.058273 | -0.079122 | 0.118392 | Joy | comdey/drama | 2015 | Joy 04 | |
| 6 | -0.013547 | -0.014493 | 0.063510 | 0.127740 | -0.068335 | -0.095264 | -0.138851 | 0.174558 | -0.058543 | 0.340153 | Joy | comdey/drama | 2015 | Joy 06 | |
| 7 | -0.009002 | -0.008628 | 0.033626 | 0.067546 | -0.039601 | -0.033108 | -0.082383 | 0.081205 | 0.000210 | 0.124174 | Joy | comdey/drama | 2015 | Joy 07 |
xxxxxxxxxxPCA Visualizations¶
x
def vis_pcs(M, a, b, label='title', hover_name='genre', symbol=None, size=None): M = M.reset_index() return px.scatter( M, f"PC{a}", f"PC{b}", color=label, hover_name=hover_name, symbol=symbol if symbol in M.columns else None, size=size if size in M.columns else None, marginal_x='box', height=800 )def vis_loadings(a=0, b=1, hover_name='term_str'): #X = LOADINGS.join(VOCAB) X = LOADINGS.join(VSHORT) return px.scatter(X.reset_index(), f"PC{a}", f"PC{b}", text='term_str', size='i', color='max_pos_group', marginal_x='box', height=800)DCM| PC0 | PC1 | PC2 | PC3 | PC4 | PC5 | PC6 | PC7 | PC8 | PC9 | title | genre | year | doc | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| screenplay_id | scene_id | ||||||||||||||
| joy | 1 | -0.009742 | -0.008930 | 0.057842 | 0.044163 | -0.012808 | -0.088922 | -0.105269 | 0.061762 | 0.075649 | 0.080738 | Joy | comdey/drama | 2015 | Joy 01 |
| 2 | -0.026036 | 0.035024 | -0.077992 | -0.172449 | 0.015656 | -0.044006 | -0.114761 | 0.045198 | 0.023700 | 0.105914 | Joy | comdey/drama | 2015 | Joy 02 | |
| 4 | -0.005343 | -0.007115 | 0.026311 | 0.068166 | -0.069754 | -0.057152 | -0.063562 | 0.058273 | -0.079122 | 0.118392 | Joy | comdey/drama | 2015 | Joy 04 | |
| 6 | -0.013547 | -0.014493 | 0.063510 | 0.127740 | -0.068335 | -0.095264 | -0.138851 | 0.174558 | -0.058543 | 0.340153 | Joy | comdey/drama | 2015 | Joy 06 | |
| 7 | -0.009002 | -0.008628 | 0.033626 | 0.067546 | -0.039601 | -0.033108 | -0.082383 | 0.081205 | 0.000210 | 0.124174 | Joy | comdey/drama | 2015 | Joy 07 | |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| the_social_network | 569 | -0.040815 | -0.024609 | -0.139333 | -0.186990 | 0.060854 | 0.099896 | -0.030740 | 0.066110 | -0.025069 | 0.040497 | The Social Network | drama/historical_fiction | 2009 | The Social Network 569 |
| 572 | -0.030844 | -0.016715 | -0.100461 | -0.139302 | 0.049060 | 0.142616 | 0.076705 | 0.102582 | 0.011328 | 0.077599 | The Social Network | drama/historical_fiction | 2009 | The Social Network 572 | |
| 573 | -0.020800 | -0.011092 | -0.080757 | -0.107443 | -0.035434 | -0.023990 | -0.002382 | 0.025048 | -0.016980 | 0.026415 | The Social Network | drama/historical_fiction | 2009 | The Social Network 573 | |
| 574 | -0.020562 | -0.013363 | -0.064003 | -0.057032 | 0.025329 | 0.071072 | -0.034069 | -0.057014 | -0.008166 | 0.041210 | The Social Network | drama/historical_fiction | 2009 | The Social Network 574 | |
| 575 | -0.014515 | -0.011432 | -0.010473 | 0.001128 | 0.028815 | 0.074049 | -0.043474 | -0.013858 | -0.000349 | -0.036334 | The Social Network | drama/historical_fiction | 2009 | The Social Network 575 |
1868 rows × 14 columns
TFIDF_L2.to_csv(f"{output_dir}/{data_prefix}-TFIDF_chap_L2.csv")DCM.iloc[:,:10].to_csv(f"{output_dir}/{data_prefix}-PCA_DCM_chap.csv")COMPS.iloc[:,[0,-1]].to_csv(f"{output_dir}/{data_prefix}-PCA_COMPS_chap.csv")LOADINGS.to_csv(f"{output_dir}/{data_prefix}-PCA_TCM_chap.csv")LIB.to_csv(f"{output_dir}/{data_prefix}-LIB.csv")LIB.to_csv(f"{output_dir}/{data_prefix}-LIB.csv")xxxxxxxxxximport matplotlib.pyplot as pltimport seaborn as snsdef vis_pcs_matplotlib(M, a, b, label='title', hover_name='genre', symbol=None, size=None): df = M.reset_index() plt.figure(figsize=(16, 12)) # Use seaborn scatterplot to support grouping sns.scatterplot( data=df, x=f"PC{a}", y=f"PC{b}", hue=label, style=symbol if symbol and symbol in df.columns else None, size=size if size and size in df.columns else None, sizes=(20, 200), alpha=0.8 ) # Optionally annotate with hover_name or doc if available if 'doc' in df.columns: for _, row in df.iterrows(): plt.text(row[f"PC{a}"], row[f"PC{b}"], str(row['doc']), fontsize=8, ha='center', va='bottom') plt.xlabel(f"PC{a}") plt.ylabel(f"PC{b}") plt.title(f"PC{a} vs PC{b} Projection") plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left') plt.tight_layout() plt.grid(True) plt.show()vis_pcs_matplotlib(DCM, 0, 1, label='title', hover_name='genre', symbol='genre', size=None)vis_pcs_matplotlib(DCM, 2, 3, label='title', hover_name='genre', symbol='genre', size=None)xxxxxxxxxxpip install adjustTextDefaulting to user installation because normal site-packages is not writeable Collecting adjustText Obtaining dependency information for adjustText from https://files.pythonhosted.org/packages/53/1c/8feedd607cc14c5df9aef74fe3af9a99bf660743b842a9b5b1865326b4aa/adjustText-1.3.0-py3-none-any.whl.metadata Downloading adjustText-1.3.0-py3-none-any.whl.metadata (3.1 kB) Requirement already satisfied: numpy in /apps/software/standard/core/jupyterlab/3.6.3-py3.11/lib/python3.11/site-packages (from adjustText) (1.24.4) Requirement already satisfied: matplotlib in /apps/software/standard/core/jupyterlab/3.6.3-py3.11/lib/python3.11/site-packages (from adjustText) (3.7.2) Requirement already satisfied: scipy in /apps/software/standard/core/jupyterlab/3.6.3-py3.11/lib/python3.11/site-packages (from adjustText) (1.11.2) Requirement already satisfied: contourpy>=1.0.1 in /apps/software/standard/core/jupyterlab/3.6.3-py3.11/lib/python3.11/site-packages (from matplotlib->adjustText) (1.1.0) Requirement already satisfied: cycler>=0.10 in /apps/software/standard/core/jupyterlab/3.6.3-py3.11/lib/python3.11/site-packages (from matplotlib->adjustText) (0.11.0) Requirement already satisfied: fonttools>=4.22.0 in /apps/software/standard/core/jupyterlab/3.6.3-py3.11/lib/python3.11/site-packages (from matplotlib->adjustText) (4.42.1) Requirement already satisfied: kiwisolver>=1.0.1 in /apps/software/standard/core/jupyterlab/3.6.3-py3.11/lib/python3.11/site-packages (from matplotlib->adjustText) (1.4.5) Requirement already satisfied: packaging>=20.0 in /apps/software/standard/core/jupyterlab/3.6.3-py3.11/lib/python3.11/site-packages (from matplotlib->adjustText) (23.1) Requirement already satisfied: pillow>=6.2.0 in /apps/software/standard/core/jupyterlab/3.6.3-py3.11/lib/python3.11/site-packages (from matplotlib->adjustText) (9.5.0) Requirement already satisfied: pyparsing<3.1,>=2.3.1 in /apps/software/standard/core/jupyterlab/3.6.3-py3.11/lib/python3.11/site-packages (from matplotlib->adjustText) (3.0.9) Requirement already satisfied: python-dateutil>=2.7 in /apps/software/standard/core/jupyterlab/3.6.3-py3.11/lib/python3.11/site-packages (from matplotlib->adjustText) (2.8.2) Requirement already satisfied: six>=1.5 in /apps/software/standard/core/jupyterlab/3.6.3-py3.11/lib/python3.11/site-packages (from python-dateutil>=2.7->matplotlib->adjustText) (1.16.0) Downloading adjustText-1.3.0-py3-none-any.whl (13 kB) Installing collected packages: adjustText Successfully installed adjustText-1.3.0 Note: you may need to restart the kernel to use updated packages.
import matplotlib.pyplot as pltimport seaborn as snsfrom adjustText import adjust_textdef vis_loadings_matplotlib(LOADINGS, VSHORT, a=0, b=1, size_col='i', label_col='term_str', color_col='max_pos_group'): # Merge and reset index df = LOADINGS.join(VSHORT).reset_index() # Setup figure plt.figure(figsize=(14, 10)) scatter = sns.scatterplot( data=df, x=f"PC{a}", y=f"PC{b}", size=size_col, sizes=(20, 300), hue=color_col, palette='tab10', alpha=0.7, legend='brief' ) # Add labels texts = [] for _, row in df.iterrows(): texts.append(plt.text( row[f"PC{a}"], row[f"PC{b}"], row[label_col], fontsize=9, ha='center', va='bottom')) # Reduce overlap adjust_text(texts, arrowprops=dict(arrowstyle='-', color='gray')) # Labels and layout plt.title(f"Loadings Scatter (PC{a} vs PC{b})") plt.xlabel(f"PC{a}") plt.ylabel(f"PC{b}") plt.tight_layout() plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left') plt.show()vis_loadings_matplotlib(LOADINGS, VSHORT, a=2, b=3)4567 [-0.50789139 -0.0949744 ] 4680 [ 0.27984135 -0.85382286] 4897 [-0.68490901 0.7587224 ] 4961 [0.66463274 0.06770078] 4595 [0.3497628 0.77650478] 4861 [ 0.92218712 -0.49231247] 4888 [0.8829542 0.6585762] 4655 [ 0.26758952 -0.31061838] 4727 [ 0.39000279 -0.56890268] 4797 [ 0.00109492 -0.25192905] 4696 [-0.63752061 -0.11531692] 4873 [-0.34379271 -0.85557578] 4700 [0.97469474 0.19375244] 4722 [ 0.79768466 -0.30189782] 4898 [-0.37117123 0.098512 ] 3645 [-0.63608922 0.79200478] 4369 [ 0.60135564 -0.61232354] 4728 [-0.29478284 0.38328454] 4968 [-0.29217292 0.0951443 ] 4682 [0.03684512 0.08431255] 4775 [-0.98554693 -0.18321598] 4619 [-0.94301574 0.46140191] 4944 [-0.57149304 0.40069713] 4637 [-0.26096829 -0.53517483] 4864 [-0.5752367 0.66579395] 4075 [-0.99149413 0.31178917] 4086 [-0.23573339 -0.81125532] 4882 [ 0.59989225 -0.03807904] 4890 [0.97859965 0.85489858] 3864 [ 0.39804033 -0.04333608] 4210 [0.30682037 0.30935263] 4747 [0.88937664 0.37713575] 4766 [-0.4846322 0.7575536] 4703 [-0.10544989 0.97683313] 4950 [-0.13955358 0.02228346] 4670 [0.26181109 0.13170955] 4752 [-0.37389538 0.14618569] 4825 [-0.19691453 0.5874795 ] 4845 [-0.89786202 0.59269186] 2735 [0.70608525 0.4303838 ] 2779 [ 0.64064863 -0.81163956] 3323 [-0.02258698 -0.71013477] 3378 [-0.37710638 0.13123947] 3534 [ 0.43904856 -0.83977336] 3570 [-0.81679052 -0.6808206 ] 3844 [-0.4560666 -0.97362067] 4164 [-0.00391237 0.12419355] 4256 [ 0.3866296 -0.57282761] 4499 [0.61866025 0.74134243] 3021 [-0.07191277 0.16835636] 3913 [0.28667312 0.40738733] 3945 [0.26722785 0.25857102] 4328 [ 0.90199546 -0.39376932] 4463 [-0.38275195 -0.00836352] 4544 [ 0.74566373 -0.1813279 ] 3149 [-0.26844 -0.84573989] 3608 [-0.62763055 0.87972929] 3187 [ 0.04026056 -0.06596498] 3272 [-0.15985991 0.41287007] 3366 [0.68832306 0.33778308] 3619 [-0.32852662 -0.22003518] 3129 [0.62790639 0.00247026] 3209 [-0.6024152 0.52891191] 3812 [ 0.41977505 -0.73637968] 4370 [0.69776197 0.20174958] 3721 [ 0.7859651 -0.79226463] 3726 [0.40953813 0.79965829] 3977 [ 0.90448047 -0.54458018] 4377 [-0.50428057 0.46661517] 4464 [-0.55352149 -0.58546816] 3367 [-0.60840893 0.23105144] 4130 [-0.64695302 0.24551274] 4154 [0.3451432 0.1464167] 4203 [0.08673552 0.65646069] 4685 [0.66283942 0.04883871] 4778 [0.99078141 0.81483882] 4710 [0.93864231 0.54674046] 4995 [-0.14757475 -0.14460523] 3804 [0.59515359 0.68140543] 4245 [ 0.44741491 -0.01138915] 4736 [-0.2168871 -0.92900567] 4879 [0.87474296 0.8732179 ] 1964 [ 0.39385679 -0.78634279] 2326 [0.78577897 0.11897849] 3916 [-0.18589221 -0.19002688] 3958 [ 0.37541357 -0.27564728] 4431 [ 0.03123482 -0.9853634 ] 3057 [0.6829225 0.08692274] 3475 [-0.87650613 0.20650484] 3978 [ 0.27621783 -0.89757697] 4000 [-0.10405637 -0.22874771] 4249 [ 0.26397287 -0.27382007] 4434 [-0.14029864 -0.11123084] 4674 [ 0.42192629 -0.85248755] 4877 [-0.05213851 0.97302352] 4650 [0.76858214 0.64985376] 4977 [-0.31590913 -0.09262581] 3266 [0.87927594 0.17378943] 4162 [ 0.14559943 -0.21933383] 3535 [ 0.88904723 -0.93771457] 3698 [-0.09253234 0.83753478] 3756 [ 0.41756882 -0.51025105] 3818 [ 0.49306199 -0.91820772] 4375 [ 0.65973545 -0.88585915] 4509 [ 0.48694775 -0.52715202] 3486 [0.69723097 0.27493277] 3587 [ 0.15457958 -0.74290148] 3194 [0.4226593 0.72173901] 3451 [ 0.64327421 -0.66288328] 3979 [-0.49639661 -0.11742076] 4874 [-0.81260931 -0.82872606] 4875 [ 0.31601556 -0.43109877] 4759 [ 0.5922964 -0.0205816] 4772 [-0.86206539 0.54680175] 3100 [0.39996781 0.29389123] 3331 [-0.40953243 -0.93313983] 3358 [0.03184316 0.29462138] 4174 [ 0.18924491 -0.71714059] 3496 [ 0.53231496 -0.62074789] 4173 [0.91234353 0.66191411] 4380 [ 0.41956868 -0.38251641] 3377 [0.78401305 0.93637112] 3521 [ 0.58079679 -0.80795023] 3604 [-0.43042053 -0.84246611] 4439 [ 0.98657214 -0.63854505] 4726 [-0.60443905 0.40909158] 4939 [0.65292146 0.42017089] 3385 [ 0.13075714 -0.36623706] 3603 [-0.54196761 -0.93166067] 3858 [ 0.35658499 -0.98843815] 3870 [-0.46706463 -0.32697341] 4647 [-0.2581715 -0.2921683] 4843 [ 0.62954507 -0.12838837] 4615 [-0.73751829 0.53552492] 4885 [-0.65186126 -0.70120301] 3208 [0.56214832 0.07903358] 3352 [ 0.20372567 -0.45045819] 4718 [-0.3587201 -0.63778244] 4748 [-0.01196884 -0.34606923] 4745 [-0.26600972 0.1867921 ] 4880 [-0.35029209 -0.10003265] 4808 [0.66316222 0.10085703] 4813 [-0.59598846 0.51976544] 4769 [-0.73748164 0.61974917] 4889 [ 0.57774798 -0.28866064] 3087 [ 0.38927575 -0.68587355] 3288 [-0.5653826 -0.04001333] 3444 [ 0.60914588 -0.44415106] 3960 [-0.91869084 -0.2051504 ] 4238 [-0.57958814 0.63368793] 4349 [0.71878843 0.41860997] 3218 [ 0.53730553 -0.4789165 ] 3317 [-0.11739549 -0.97328565] 3376 [ 0.17954889 -0.3172583 ] 3543 [-0.97500039 0.47043354] 3738 [-0.05185838 0.16433736] 4447 [-0.78803041 0.34308919] 4498 [0.75384166 0.0719807 ] 3282 [0.42731973 0.30582445] 3678 [-0.55958508 0.54555064] 3707 [-0.82825878 -0.58935881] 3787 [ 0.27045633 -0.46377074] 4332 [-0.68767634 -0.38785324] 4601 [ 0.85539255 -0.7457105 ] 4730 [0.73168567 0.65734323] 4734 [-0.94018747 0.1596678 ] 4659 [-0.42535912 0.0442724 ] 4662 [-0.14867877 0.64240364] 4983 [ 0.08495983 -0.88551527] 4986 [0.86512016 0.86489761] 3800 [-0.13561459 -0.25025789] 4144 [-0.736182 0.5850767] 4481 [ 0.74311269 -0.75189765] 4826 [-0.02920009 0.81298616] 4870 [-0.15813713 -0.48218549] 4762 [-0.52956211 0.62101788] 4763 [-0.80303208 -0.88032828] 3293 [-0.20802869 0.85122242] 3512 [-0.2121098 0.6799114] 4045 [-0.26091064 0.36995697] 4228 [0.1256863 0.47398418] 4827 [0.18771618 0.20424482] 4829 [0.62446465 0.75625044] 4610 [-0.74148599 0.22707357] 4735 [-0.76776014 0.89464478] 3690 [-0.05248942 -0.69199414] 3767 [-0.83767043 -0.60667575] 4392 [-0.96046906 -0.19105038] 4625 [0.92580083 0.76521529] 4883 [ 0.34459389 -0.02107828] 3529 [-0.41875714 0.33629189] 3614 [ 0.9201699 -0.94009933] 3730 [0.08569959 0.83643425] 3879 [ 0.37977107 -0.42491136] 3906 [-0.39260956 -0.52130444] 4005 [-0.4660635 0.49859716] 4233 [0.12660315 0.73191049] 4252 [0.65058265 0.80199224] 4290 [0.70685331 0.69211689] 4354 [-0.10552472 0.62846813] 4788 [0.14977058 0.07284915] 4834 [0.51468471 0.6371852 ] 3078 [-0.54146352 -0.54201087] 3162 [ 0.71994177 -0.40121656] 3436 [-0.36839863 0.67123832] 3469 [-0.61219418 -0.14143739] 3567 [-0.82529727 -0.4202689 ] 4194 [0.99880854 0.81030019] 4267 [0.39991605 0.67844941] 4603 [-0.96455469 0.56665641] 4652 [-0.00846154 -0.79553972] 4731 [-0.70824885 0.92033274] 4976 [-0.08597983 0.13694158] 4636 [0.56136576 0.2711896 ] 4807 [-0.2544723 0.91379008] 4789 [-0.27528017 -0.18390174] 4846 [-0.29808507 -0.28585688] 4743 [ 0.02684483 -0.12364556] 4868 [-0.78808656 -0.32666867] 3193 [-0.06854027 -0.17807716] 3431 [-0.23133136 -0.64581471] 3500 [-0.84262122 0.82364748] 3030 [ 0.40311867 -0.21029159] 3243 [ 0.85955516 -0.73248954] 3609 [-0.90408883 0.39369518] 3610 [-0.49682803 -0.1048443 ] 3801 [ 0.11776002 -0.52272868] 4338 [-0.90423538 -0.92037215] 4588 [-0.15713708 -0.68770481] 4878 [ 0.42479657 -0.26534507] 3138 [0.22313656 0.14937802] 3342 [-0.58874714 0.10713971] 3941 [-0.53546001 -0.74506954] 3394 [ 0.42658872 -0.26293702] 3944 [-0.18475802 0.99916521] 3956 [0.96689583 0.79114421] 4701 [ 0.40855162 -0.80496292] 4887 [-0.1728207 0.53262792] 4574 [-0.47876922 0.50483193] 4867 [-0.92366525 -0.83094639] 3365 [-0.64078489 0.66094366] 3852 [-0.73559921 0.31331204] 4356 [-0.87872374 -0.78899989] 4576 [0.77540838 0.45846795] 4824 [0.43038636 0.41264046] 4905 [0.92745456 0.25183729] 4927 [-0.23509697 -0.31387647] 4773 [ 0.79671128 -0.20664385] 4919 [ 0.4388411 -0.37706324] 4675 [ 0.07049094 -0.61791024] 4760 [ 0.27367841 -0.33554027] 4886 [-0.9027495 0.28863445] 4849 [-0.40673479 -0.09784668] 4996 [-0.93324051 0.60063873] 4563 [ 0.77692842 -0.25665672] 4946 [0.25070596 0.15298544] 3217 [-0.97803552 0.66011387] 3379 [-0.54773207 0.11256368] 4417 [-0.03534757 0.80709515] 4486 [0.87059571 0.37530031] 4562 [-0.65816732 -0.6527398 ] 4795 [0.40745022 0.83373416] 3703 [-0.16160908 0.6137326 ] 4362 [0.69573669 0.78007523] 3060 [-0.16624129 0.63154659] 3068 [ 0.22509234 -0.47215323] 3348 [-0.76256047 -0.21671387] 3505 [0.66486156 0.20628663] 3915 [ 0.42560269 -0.85506307] 3936 [0.28141587 0.55986051] 4136 [ 0.69691813 -0.56984953] 4153 [0.51996457 0.67442635] 4244 [ 0.90410822 -0.25371035] 4365 [-0.04146938 0.37702478] 3631 [-0.88136197 0.38089562] 4465 [ 0.37208864 -0.7316791 ] 4945 [ 0.8734712 -0.7412094] 4989 [-0.7500131 0.54203052] 4561 [ 0.99478077 -0.49772307] 4764 [ 0.11756996 -0.57184979] 3191 [-0.77486535 0.85879784] 3460 [-0.73093689 0.5644405 ] 4124 [ 0.22479023 -0.87292505] 4224 [ 0.16220694 -0.01194152] 4395 [-0.55369099 -0.39028115] 4396 [ 0.3114966 -0.18403057] 3074 [-0.4019957 0.36331487] 3177 [ 0.42948676 -0.11806328] 3044 [-0.60230863 -0.53704361] 3126 [0.09296313 0.95517903] 3407 [ 0.88954875 -0.25633875] 3540 [0.94500282 0.88920969] 4001 [0.31782198 0.93338312] 4378 [0.97934029 0.3189182 ] 4436 [-0.53776759 -0.21573922] 3142 [0.9983642 0.96315805] 4405 [-0.86233918 0.45478013] 4720 [-0.70003986 -0.20496442] 4740 [0.17426227 0.6093581 ] 3917 [-0.07732332 -0.59343673] 3971 [0.00838061 0.09068325] 4355 [ 0.26728183 -0.31981475] 4598 [ 0.86553857 -0.11467457] 4910 [-0.26845976 -0.59156676] 3061 [0.15667325 0.4333029 ] 3153 [ 0.52982588 -0.00735777] 3154 [ 0.31708736 -0.02917978] 3204 [ 0.63108339 -0.81433931] 3370 [ 0.18313888 -0.58517146] 3457 [-0.91716903 -0.05644253] 3593 [-0.75741875 -0.37187845] 3649 [0.952136 0.17463329] 3656 [-0.9089708 0.81999709] 4141 [0.3914831 0.61548138] 4291 [-0.91568387 -0.55802475] 4321 [0.10694887 0.81681711] 4532 [-0.8546516 -0.51857889] 4212 [-0.66467069 -0.41896372] 4266 [ 0.71450672 -0.99223834] 3390 [0.4011436 0.99573196] 4163 [-0.25758609 -0.05790198] 4833 [0.97784429 0.75794338] 4860 [-0.58977468 0.61615468] 4925 [ 0.13666404 -0.98097802] 3062 [ 0.48548288 -0.11995141] 3548 [-0.58259562 0.62020689] 4275 [0.94999305 0.21168262] 3918 [0.50119935 0.62555777] 4283 [ 0.32185558 -0.29388798] 3636 [0.06136464 0.68166918] 3736 [ 0.47210975 -0.13581696] 3825 [-0.50508003 -0.29618033] 3827 [ 0.16460668 -0.66970659] 4422 [ 0.40234251 -0.96822475] 3666 [-0.90796705 -0.10615397] 4497 [0.99192322 0.87301045] 3115 [0.07032736 0.69448424] 3295 [-0.36585246 -0.24268267] 3579 [ 0.11203602 -0.81523507] 3621 [ 0.22513924 -0.63838667] 3933 [0.82940165 0.94601488] 4150 [-0.13671518 -0.506086 ] 3155 [ 0.35509653 -0.83856932] 3701 [ 0.92397926 -0.721796 ] 3748 [0.29005942 0.96712244] 3849 [ 0.14157863 -0.20440311] 3897 [ 0.47921425 -0.30801178] 4480 [-0.14958957 0.4715774 ] 3417 [ 0.76060944 -0.80290422] 3456 [0.94232265 0.23997801] 3989 [ 0.46730405 -0.50472221] 4128 [ 0.89913332 -0.87128295] 4142 [-0.69745002 -0.27079199] 3037 [-0.68348321 -0.24801034] 3270 [ 0.8216502 -0.05899689] 3308 [ 0.31994865 -0.65985131] 3519 [ 0.06725527 -0.49591736] 3716 [0.1963899 0.46968139] 3718 [ 0.60298452 -0.379235 ] 3742 [-0.42891873 -0.31986808] 3778 [-0.4136138 0.82790662] 3856 [-0.48597634 0.77637024] 3911 [-0.45615539 -0.3028542 ] 4192 [0.94178988 0.35175477] 4246 [-0.3533115 -0.11884224] 4272 [-0.40466513 0.67215969] 4313 [-0.74683796 -0.67162465] 4333 [ 0.48842782 -0.73508825] 4528 [-0.03331998 0.32856537] 3485 [ 0.24782144 -0.0996596 ] 3922 [ 0.39094446 -0.69240344] 4277 [-0.14181436 0.05694525] 3381 [0.56684997 0.27502282] 3384 [-0.0934669 -0.05961059] 3648 [-0.61498726 0.76333745] 4324 [ 0.93065133 -0.91934273] 4729 [-0.19383796 -0.5199996 ] 4956 [-0.17923896 0.278094 ] 2087 [ 0.98195938 -0.08763889] 2336 [-0.39472306 -0.88106983] 3284 [-0.18157361 0.62998102] 3483 [-0.15081194 0.05784893] 3528 [0.86195316 0.4935329 ] 3655 [0.54034964 0.28980814] 3854 [0.05212252 0.58863683] 3954 [ 0.3999689 -0.23063471] 4215 [0.22383315 0.25854397] 3070 [0.22767285 0.61103585] 4535 [-0.6692891 0.90461765] 3802 [ 0.01963117 -0.48282015] 4289 [0.7079454 0.06682188] 3145 [0.08615065 0.74121479] 3164 [0.52069818 0.49393326] 3499 [ 0.33994721 -0.00547555] 3553 [0.99457556 0.46872969] 3683 [0.43912246 0.29297898] 3768 [-0.07924864 -0.27332552] 3901 [ 0.11308829 -0.64242461] 4135 [-0.1825273 0.82540306] 4171 [-0.61193213 0.93948132] 4681 [0.72600269 0.42526872] 4741 [ 0.35891529 -0.61149436] 3711 [0.90478119 0.51977586] 3882 [-0.58210164 -0.03750133] 4003 [ 0.04081386 -0.27906704] 3415 [-0.91303997 -0.54116286] 4301 [0.50014508 0.88753336] 4314 [-0.37532238 0.00062746] 4357 [-0.19043552 -0.73235013] 3179 [0.53490305 0.20448241] 3302 [-0.0869853 0.90998807] 3383 [-0.18694107 0.13331062] 4193 [-0.66466428 -0.28625751] 3526 [0.31367132 0.01585891] 3775 [-0.20477307 -0.39505284] 4358 [0.84677592 0.83656098] 3841 [ 0.74485401 -0.68056104] 3877 [-0.4283381 -0.94225453] 3888 [0.74397651 0.15646339] 4149 [0.77049218 0.99686054] 4388 [-0.88132525 0.98891515] 4495 [ 0.45624717 -0.28781148] 3108 [-0.75383614 -0.19050765] 3320 [0.41383295 0.99926045] 3332 [0.07826366 0.67603769] 3537 [ 0.28395923 -0.72122613] 3633 [-0.43755278 0.14126091] 3747 [-0.61458354 0.48961521] 3798 [-0.27114255 -0.93915011] 4017 [-0.12301022 -0.42731661] 4273 [-0.3824555 -0.88793224] 3166 [0.34742617 0.96331723] 3450 [-0.80629959 -0.28624856] 3909 [-0.28703826 0.06580046] 3935 [0.90138286 0.9214226 ] 4269 [-0.36502124 0.19566491] 3660 [0.14117841 0.1797883 ] 4287 [ 0.18989549 -0.78186443] 4322 [ 0.81139464 -0.32493184] 4533 [ 0.16358761 -0.28959602] 3283 [ 0.10514618 -0.05325304] 3665 [ 0.8233538 -0.15274349] 3679 [ 0.94357845 -0.04465503] 3866 [-0.42414578 0.3023681 ] 4458 [0.62115826 0.46413127] 3498 [0.05778177 0.74967736] 3735 [0.46171115 0.80917799] 3511 [ 0.98946345 -0.96600127] 3560 [ 0.69978 -0.43903546] 3774 [ 0.05629016 -0.86127698] 3899 [-0.45635497 0.2913166 ] 3221 [ 0.09862568 -0.86104688] 3850 [ 0.37022546 -0.76804566] 3861 [-0.3863402 -0.56343757] 4600 [-0.84431981 -0.34831011] 4936 [-0.05195653 0.96280485] 3715 [-0.03941067 -0.92431756] 3766 [ 0.32216814 -0.26326834] 4466 [ 0.72517629 -0.10474219] 3059 [ 0.21218686 -0.42980154] 3369 [-0.36238362 0.86254183] 3373 [-0.70019084 0.52230677] 3670 [0.50443362 0.11900626] 3832 [-0.59336549 0.89135151] 4350 [-0.98669897 0.51422386] 3098 [ 0.72092399 -0.36952564] 3942 [0.3575609 0.81409803] 3677 [-0.06340268 -0.32827427] 3693 [0.1117909 0.34204573] 4407 [ 0.7336184 -0.17866716] 3203 [-0.70525291 0.8171444 ] 3503 [-0.3551913 -0.61930597] 4302 [ 0.08829252 -0.74814942] 3583 [-0.09653308 -0.09750654] 3953 [-0.34862318 -0.02711016] 3024 [-0.46905411 0.93977034] 3424 [-0.70970272 -0.27927367] 3826 [-0.35299485 -0.083738 ] 3964 [ 0.2766346 -0.79972144] 4170 [0.10799789 0.31893396] 4236 [-0.31123621 0.67208316] 4295 [ 0.63331387 -0.07617358] 4421 [ 0.0505067 -0.02328959] 4515 [0.12036412 0.08185513] 3223 [-0.29078908 0.42767299] 3400 [-0.64123847 -0.49869044] 3414 [ 0.9218675 -0.72381418] 3198 [-0.87518935 0.76795821] 3577 [-0.20445424 0.52870998] 3797 [0.00484254 0.9852143 ] 3816 [-0.87272877 -0.02436954] 4125 [-0.14162964 -0.36593132] 4257 [0.48549615 0.25329862] 4346 [ 0.35534084 -0.86863862] 4364 [-0.78903479 -0.48844712] 4451 [ 0.82702314 -0.03718974] 4513 [ 0.11678999 -0.75079 ] 3122 [ 0.68269447 -0.46558854] 4013 [-0.92433102 0.74549778] 4259 [-0.85144337 0.94016124] 4319 [-0.05083005 -0.45407325] 4341 [-0.23564774 0.36984302] 4482 [0.65225067 0.73848205] 4538 [ 0.61833144 -0.91835714] 3263 [ 0.81896224 -0.45207886] 3372 [ 0.45349186 -0.80871869] 4644 [0.35877424 0.81011138] 4667 [-0.07783209 -0.42450347] 3090 [ 0.92414308 -0.53617526] 3287 [-0.21595458 -0.86661034] 3782 [ 0.65104993 -0.30528005] 4191 [ 0.38158549 -0.4746144 ] 3641 [ 0.85765848 -0.0285267 ] 4327 [ 0.8363687 -0.95362025] 4546 [ 0.43348567 -0.24279406] 3546 [0.86142783 0.22342867] 3789 [ 0.33187945 -0.65832425] 3056 [-0.02416343 -0.58444468] 4462 [0.47578495 0.35926309] 4742 [ 0.16192358 -0.36631342] 4832 [0.70245301 0.01586195] 4904 [ 0.47575639 -0.27186078] 3035 [ 0.90075053 -0.88323135] 3476 [-0.44886444 -0.75501038] 3720 [ 0.88921037 -0.294986 ] 3780 [-0.31701992 -0.58998076] 4002 [0.73558321 0.08043016] 4137 [ 0.76613895 -0.77960486] 4138 [ 0.64524965 -0.87446678] 4540 [0.33931369 0.83870824] 3135 [-0.80522868 0.46960789] 3290 [-0.83142714 0.09187261] 3428 [-0.59771 -0.91989489] 3937 [0.91192318 0.70041233] 3313 [ 0.4089614 -0.51877302] 3334 [0.82487904 0.9611386 ] 3064 [0.93479579 0.55825234] 3071 [-0.71966111 -0.63456708] 3966 [-0.86731477 -0.72206421] 4126 [0.69687279 0.61692296] 4550 [-0.98975712 -0.48110438] 3443 [0.58636951 0.26337105] 3773 [0.99568879 0.69361529] 3904 [ 0.39210158 -0.46076228] 3171 [-0.76269226 0.10685422] 3328 [ 0.7366004 -0.43274592] 3835 [0.74345849 0.33364573] 3908 [ 0.70957642 -0.82664654] 3815 [ 0.74532475 -0.99362959] 4330 [-0.37867987 -0.7526962 ] 3131 [0.24567739 0.02918909] 3200 [-0.69870175 0.49872993] 3571 [-0.75321926 -0.54531474] 3606 [-0.73601909 -0.9499221 ] 4438 [ 0.93796408 -0.56315222] 3216 [-0.69442081 0.26179571] 3341 [0.17684475 0.72198058] 3929 [-0.1829965 -0.67987147] 4473 [-0.31566734 -0.89424965] 3092 [ 0.11078013 -0.27830361] 3792 [-0.29454537 0.99241894] 3972 [-0.99020314 0.96334293] 3076 [ 0.30861875 -0.66650839] 4240 [-0.14211826 0.36838125] 3152 [-0.81209438 0.70763161] 3618 [-0.0965405 0.60132142] 3865 [ 0.89880819 -0.79504713] 4134 [ 0.45969036 -0.51700163] 3180 [-0.81718752 -0.45742718] 3987 [-0.24608597 0.97096467] 4209 [0.77754745 0.39425428] 4237 [0.67512466 0.44469638] 3261 [0.96269777 0.94137223] 3530 [0.40105703 0.500993 ] 3975 [-0.49644236 0.21282784] 3277 [-0.6080488 -0.22644799] 3585 [0.05973749 0.01411777] 4143 [ 0.23964717 -0.5584029 ] 4282 [-0.1196725 -0.17187243] 4300 [0.79294475 0.64982047] 3952 [0.25912551 0.04386337] 4445 [ 0.95038686 -0.03779606] 3289 [ 0.69114383 -0.54855183] 3435 [0.45998955 0.26598708] 3588 [0.52674554 0.83607259] 4437 [0.71053297 0.53100755] 4020 [-0.89003339 0.45220293] 4424 [ 0.87947318 -0.04559412] 4663 [-0.73198264 -0.06560855] 4972 [-0.4225516 -0.00455398] 3902 [-0.40896859 0.44663245] 4167 [-0.90194722 0.37277492] 4381 [-0.67134893 0.99810533] 3304 [-0.32792547 0.31298547] 3752 [ 0.18167856 -0.68753018] 3084 [-0.78440569 -0.60323872] 3481 [0.31266563 0.32505241] 3114 [-0.40194192 -0.87855715] 3346 [-0.87452377 0.69713677] 3494 [0.60980354 0.13694393] 3616 [ 0.87194758 -0.84558804] 4195 [-0.45131442 -0.12461063] 4339 [-0.80996119 -0.66707876] 4340 [ 0.61628545 -0.10086384] 3314 [-0.29704362 0.91870584] 3382 [-0.27697262 -0.72516382] 3790 [ 0.63654318 -0.12875614] 3872 [ 0.37387005 -0.66004581] 4517 [0.99893835 0.6315212 ] 3118 [ 0.80463643 -0.55193734] 4504 [0.47677462 0.4749075 ] 3556 [ 0.71326087 -0.27234797] 4253 [-0.89698202 0.61938065] 3454 [-0.29820651 0.88939573] 3880 [-0.81552291 -0.14349785] 3233 [ 0.62048795 -0.39811256] 3368 [-0.99486665 -0.08796497] 3706 [-0.36059792 -0.14142927] 3725 [-0.15891506 -0.77009939] 3764 [ 0.00047841 -0.23759048] 3255 [-0.51672908 0.70090976] 4368 [0.68556123 0.06415327] 4079 [-0.18092837 0.17781876] 4093 [-0.97286224 0.22659008] 3043 [-0.40041489 -0.90174127] 3639 [0.58425725 0.18964066] 4468 [0.20934995 0.54674683] 3696 [0.80253368 0.58216247] 3891 [0.96215764 0.40754959] 4432 [-0.62376779 -0.02354909] 4460 [ 0.59461042 -0.19328171] 3516 [0.42617051 0.56216768] 3627 [-0.4630203 -0.21063809] 3245 [-0.34052333 0.7041012 ] 3452 [-0.15182011 -0.94676982] 4297 [-0.29312327 0.44147509] 3134 [0.88552613 0.28388322] 3659 [ 0.430062 -0.64756554] 3455 [-0.29531238 0.62827556] 4503 [-0.16968341 0.73126124] 3161 [ 0.96532908 -0.09607339] 3462 [-0.61124273 0.30597944] 4450 [0.55492836 0.53817256] 3176 [-0.81896186 -0.59963371] 3189 [-0.38059589 -0.82322981] 3617 [0.20052836 0.61628671] 3598 [ 0.7570935 -0.47456021] 3783 [-0.72809369 -0.96331216] 3822 [-0.26584832 -0.69191085] 4336 [0.16837115 0.47156434] 4397 [-0.0474948 -0.02781288] 3318 [-0.19776036 -0.04720657] 3506 [-0.50299954 -0.65118959] 3761 [ 0.94551588 -0.4386438 ] 4448 [-0.04590342 0.63111224] 3050 [ 0.82296517 -0.03612126] 3088 [-0.3495818 -0.50647538] 3101 [ 0.3558828 -0.14773893] 3361 [ 0.66861558 -0.85906863] 3949 [ 0.8304011 -0.77251592] 4280 [-0.54417133 -0.45224593] 4435 [ 0.71491901 -0.56346858] 3484 [ 0.8287521 -0.75301858] 3676 [ 0.60549545 -0.69735357] 2440 [-0.53957632 -0.22942914] 2983 [-0.13332021 -0.24421727] 3249 [0.21779611 0.31764502] 3544 [0.79445505 0.31202144] 3692 [ 0.71450698 -0.94629696] 3285 [ 0.02530396 -0.92210224] 3515 [-0.19556633 -0.15858316] 4140 [-0.61122528 -0.50150195] 3244 [-0.27444302 -0.98634791] 3301 [0.19822576 0.59403236] 4320 [-0.27704861 -0.71028403] 3063 [-0.89455552 0.40183994] 3167 [ 0.34234796 -0.91655032] 3473 [0.04853236 0.79793906] 3723 [-0.47784987 0.91296671] 4423 [ 0.2181828 -0.36350861] 3863 [0.54770086 0.11576876] 4485 [ 0.52633789 -0.58828071] 3055 [ 0.43578996 -0.23167322] 4285 [ 0.89842813 -0.61197013] 3419 [-0.17601893 0.90521949] 3859 [ 0.76580736 -0.42926668] 3300 [-0.21294827 0.66832715] 3403 [-0.91259546 -0.56931719] 3493 [0.65979347 0.86038941] 3541 [ 0.22271944 -0.63726321] 3038 [0.32760201 0.09872055] 3310 [-0.67241358 0.85273571] 3895 [0.07064462 0.21677937] 3495 [-0.87292944 -0.02624118] 3673 [0.31418632 0.34577326] 3202 [-0.85528168 0.6180529 ] 3501 [ 0.65774973 -0.94899523] 4469 [-0.37236441 0.52678279] 3509 [0.93467158 0.07482088] 4156 [0.15877824 0.89421566] 3392 [0.18560699 0.07058219] 4371 [-0.96236804 -0.28261465] 3133 [0.80856753 0.82422726] 4323 [ 0.827168 -0.12050051] 3184 [-0.79148927 0.31099274] 3728 [0.34521357 0.92531626] 3892 [-0.78851837 -0.74732676] 3959 [-0.98405238 0.49648934] 3113 [-0.53197032 0.66901771] 3181 [-0.31845806 -0.15379574] 3615 [-0.29076253 -0.21580621] 4158 [-0.76281453 0.76995104] 4223 [-0.17638608 -0.55210737] 3422 [0.4494971 0.54266429] 3502 [ 0.52175128 -0.52973193] 3401 [-0.23860066 -0.01012478] 4232 [0.5829964 0.82228898] 4530 [ 0.92024538 -0.61418549] 3680 [-0.00330652 -0.58602255] 3788 [-0.14861414 0.94594743] 3712 [-0.53430336 0.60321836] 4204 [0.07334043 0.97595538] 3357 [0.93825943 0.56341625] 3630 [0.81592468 0.26509602] 2311 [0.25754223 0.31700767] 2312 [0.51160989 0.26773627] 3340 [-0.30051849 -0.38648158] 4345 [-0.48712613 -0.81481185] 3886 [ 0.0395261 -0.25182748] 4011 [-0.06039462 0.81789096] 3434 [-0.12313875 -0.03949971] 4014 [0.23223166 0.50072892] 3085 [0.00899142 0.8103856 ] 3364 [0.62748368 0.64301244] 3520 [0.6193071 0.33203584] 3940 [ 0.68863022 -0.69422372] 3139 [-0.01194198 0.59450435] 3907 [0.87716148 0.62997541] 3881 [-0.3996043 -0.12283224] 3993 [0.16314199 0.28382574] 3733 [-0.67650426 0.29919234] 4262 [0.78480325 0.34491059] 3839 [-0.76194024 0.98090704] 4274 [-0.28878246 -0.98801815] 3770 [ 0.23031334 -0.06404984] 4278 [-0.35078564 0.69999072] 1115 [0.54874355 0.70702917] 1150 [ 0.07202918 -0.80578216]
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[53], line 1 ----> 1 vis_loadings_matplotlib(LOADINGS, VSHORT, a=2, b=3) Cell In[51], line 28, in vis_loadings_matplotlib(LOADINGS, VSHORT, a, b, size_col, label_col, color_col) 23 texts.append(plt.text( 24 row[f"PC{a}"], row[f"PC{b}"], row[label_col], 25 fontsize=9, ha='center', va='bottom')) 27 # Reduce overlap ---> 28 adjust_text(texts, arrowprops=dict(arrowstyle='-', color='gray')) 30 # Labels and layout 31 plt.title(f"Loadings Scatter (PC{a} vs PC{b})") File ~/.local/lib/python3.11/site-packages/adjustText/__init__.py:724, in adjust_text(texts, x, y, objects, target_x, target_y, avoid_self, prevent_crossings, force_text, force_static, force_pull, force_explode, pull_threshold, expand, max_move, explode_radius, ensure_inside_axes, expand_axes, only_move, ax, min_arrow_len, time_lim, iter_lim, *args, **kwargs) 721 while error > 0: 722 # expand = expands[min(i, expand_steps-1)] 723 logger.debug(step) --> 724 coords, error = iterate( 725 coords, 726 target_xy_disp_coord, 727 static_coords, 728 force_text=force_text, 729 force_static=force_static, 730 force_pull=force_pull, 731 pull_threshold=pull_threshold, 732 expand=expand, 733 max_move=max_move, 734 bbox_to_contain=ax_bbox, 735 only_move=only_move, 736 ) 737 if prevent_crossings: 738 coords = remove_crossings(coords, target_xy_disp_coord, step) File ~/.local/lib/python3.11/site-packages/adjustText/__init__.py:329, in iterate(coords, target_coords, static_coords, force_text, force_static, force_pull, pull_threshold, expand, max_move, bbox_to_contain, only_move) 315 def iterate( 316 coords, 317 target_coords, (...) 326 only_move={"text": "xy", "static": "xy", "explode": "xy", "pull": "xy"}, 327 ): 328 coords = random_shifts(coords, only_move.get("explode", "xy")) --> 329 text_shifts_x, text_shifts_y = get_shifts_texts( 330 expand_coords(coords, expand[0], expand[1]) 331 ) 332 if static_coords.shape[0] > 0: 333 static_shifts_x, static_shifts_y = get_shifts_extra( 334 expand_coords(coords, expand[0], expand[1]), static_coords 335 ) File ~/.local/lib/python3.11/site-packages/adjustText/__init__.py:169, in get_shifts_texts(coords) 165 yoverlaps = overlap_intervals( 166 coords[:, 2], coords[:, 3], coords[:, 2], coords[:, 3] 167 ) 168 yoverlaps = yoverlaps[yoverlaps[:, 0] != yoverlaps[:, 1]] --> 169 overlaps = yoverlaps[(yoverlaps[:, None] == xoverlaps).all(-1).any(-1)] 170 if len(overlaps) == 0: 171 return np.zeros((coords.shape[0])), np.zeros((coords.shape[0])) AttributeError: 'bool' object has no attribute 'all'
- Output View
4567 [-0.50789139 -0.0949744 ] 4680 [ 0.27984135 -0.85382286] 4897 [-0.68490901 0.7587224 ] 4961 [0.66463274 0.06770078] 4595 [0.3497628 0.77650478] 4861 [ 0.92218712 -0.49231247] 4888 [0.8829542 0.6585762] 4655 [ 0.26758952 -0.31061838] 4727 [ 0.39000279 -0.56890268] 4797 [ 0.00109492 -0.25192905] 4696 [-0.63752061 -0.11531692] 4873 [-0.34379271 -0.85557578] 4700 [0.97469474 0.19375244] 4722 [ 0.79768466 -0.30189782] 4898 [-0.37117123 0.098512 ] 3645 [-0.63608922 0.79200478] 4369 [ 0.60135564 -0.61232354] 4728 [-0.29478284 0.38328454] 4968 [-0.29217292 0.0951443 ] 4682 [0.03684512 0.08431255] 4775 [-0.98554693 -0.18321598] 4619 [-0.94301574 0.46140191] 4944 [-0.57149304 0.40069713] 4637 [-0.26096829 -0.53517483] 4864 [-0.5752367 0.66579395] 4075 [-0.99149413 0.31178917] 4086 [-0.23573339 -0.81125532] 4882 [ 0.59989225 -0.03807904] 4890 [0.97859965 0.85489858] 3864 [ 0.39804033 -0.04333608] 4210 [0.30682037 0.30935263] 4747 [0.88937664 0.37713575] 4766 [-0.4846322 0.7575536] 4703 [-0.10544989 0.97683313] 4950 [-0.13955358 0.02228346] 4670 [0.26181109 0.13170955] 4752 [-0.37389538 0.14618569] 4825 [-0.19691453 0.5874795 ] 4845 [-0.89786202 0.59269186] 2735 [0.70608525 0.4303838 ] 2779 [ 0.64064863 -0.81163956] 3323 [-0.02258698 -0.71013477] 3378 [-0.37710638 0.13123947] 3534 [ 0.43904856 -0.83977336] 3570 [-0.81679052 -0.6808206 ] 3844 [-0.4560666 -0.97362067] 4164 [-0.00391237 0.12419355] 4256 [ 0.3866296 -0.57282761] 4499 [0.61866025 0.74134243] 3021 [-0.07191277 0.16835636] 3913 [0.28667312 0.40738733] 3945 [0.26722785 0.25857102] 4328 [ 0.90199546 -0.39376932] 4463 [-0.38275195 -0.00836352] 4544 [ 0.74566373 -0.1813279 ] 3149 [-0.26844 -0.84573989] 3608 [-0.62763055 0.87972929] 3187 [ 0.04026056 -0.06596498] 3272 [-0.15985991 0.41287007] 3366 [0.68832306 0.33778308] 3619 [-0.32852662 -0.22003518] 3129 [0.62790639 0.00247026] 3209 [-0.6024152 0.52891191] 3812 [ 0.41977505 -0.73637968] 4370 [0.69776197 0.20174958] 3721 [ 0.7859651 -0.79226463] 3726 [0.40953813 0.79965829] 3977 [ 0.90448047 -0.54458018] 4377 [-0.50428057 0.46661517] 4464 [-0.55352149 -0.58546816] 3367 [-0.60840893 0.23105144] 4130 [-0.64695302 0.24551274] 4154 [0.3451432 0.1464167] 4203 [0.08673552 0.65646069] 4685 [0.66283942 0.04883871] 4778 [0.99078141 0.81483882] 4710 [0.93864231 0.54674046] 4995 [-0.14757475 -0.14460523] 3804 [0.59515359 0.68140543] 4245 [ 0.44741491 -0.01138915] 4736 [-0.2168871 -0.92900567] 4879 [0.87474296 0.8732179 ] 1964 [ 0.39385679 -0.78634279] 2326 [0.78577897 0.11897849] 3916 [-0.18589221 -0.19002688] 3958 [ 0.37541357 -0.27564728] 4431 [ 0.03123482 -0.9853634 ] 3057 [0.6829225 0.08692274] 3475 [-0.87650613 0.20650484] 3978 [ 0.27621783 -0.89757697] 4000 [-0.10405637 -0.22874771] 4249 [ 0.26397287 -0.27382007] 4434 [-0.14029864 -0.11123084] 4674 [ 0.42192629 -0.85248755] 4877 [-0.05213851 0.97302352] 4650 [0.76858214 0.64985376] 4977 [-0.31590913 -0.09262581] 3266 [0.87927594 0.17378943] 4162 [ 0.14559943 -0.21933383] 3535 [ 0.88904723 -0.93771457] 3698 [-0.09253234 0.83753478] 3756 [ 0.41756882 -0.51025105] 3818 [ 0.49306199 -0.91820772] 4375 [ 0.65973545 -0.88585915] 4509 [ 0.48694775 -0.52715202] 3486 [0.69723097 0.27493277] 3587 [ 0.15457958 -0.74290148] 3194 [0.4226593 0.72173901] 3451 [ 0.64327421 -0.66288328] 3979 [-0.49639661 -0.11742076] 4874 [-0.81260931 -0.82872606] 4875 [ 0.31601556 -0.43109877] 4759 [ 0.5922964 -0.0205816] 4772 [-0.86206539 0.54680175] 3100 [0.39996781 0.29389123] 3331 [-0.40953243 -0.93313983] 3358 [0.03184316 0.29462138] 4174 [ 0.18924491 -0.71714059] 3496 [ 0.53231496 -0.62074789] 4173 [0.91234353 0.66191411] 4380 [ 0.41956868 -0.38251641] 3377 [0.78401305 0.93637112] 3521 [ 0.58079679 -0.80795023] 3604 [-0.43042053 -0.84246611] 4439 [ 0.98657214 -0.63854505] 4726 [-0.60443905 0.40909158] 4939 [0.65292146 0.42017089] 3385 [ 0.13075714 -0.36623706] 3603 [-0.54196761 -0.93166067] 3858 [ 0.35658499 -0.98843815] 3870 [-0.46706463 -0.32697341] 4647 [-0.2581715 -0.2921683] 4843 [ 0.62954507 -0.12838837] 4615 [-0.73751829 0.53552492] 4885 [-0.65186126 -0.70120301] 3208 [0.56214832 0.07903358] 3352 [ 0.20372567 -0.45045819] 4718 [-0.3587201 -0.63778244] 4748 [-0.01196884 -0.34606923] 4745 [-0.26600972 0.1867921 ] 4880 [-0.35029209 -0.10003265] 4808 [0.66316222 0.10085703] 4813 [-0.59598846 0.51976544] 4769 [-0.73748164 0.61974917] 4889 [ 0.57774798 -0.28866064] 3087 [ 0.38927575 -0.68587355] 3288 [-0.5653826 -0.04001333] 3444 [ 0.60914588 -0.44415106] 3960 [-0.91869084 -0.2051504 ] 4238 [-0.57958814 0.63368793] 4349 [0.71878843 0.41860997] 3218 [ 0.53730553 -0.4789165 ] 3317 [-0.11739549 -0.97328565] 3376 [ 0.17954889 -0.3172583 ] 3543 [-0.97500039 0.47043354] 3738 [-0.05185838 0.16433736] 4447 [-0.78803041 0.34308919] 4498 [0.75384166 0.0719807 ] 3282 [0.42731973 0.30582445] 3678 [-0.55958508 0.54555064] 3707 [-0.82825878 -0.58935881] 3787 [ 0.27045633 -0.46377074] 4332 [-0.68767634 -0.38785324] 4601 [ 0.85539255 -0.7457105 ] 4730 [0.73168567 0.65734323] 4734 [-0.94018747 0.1596678 ] 4659 [-0.42535912 0.0442724 ] 4662 [-0.14867877 0.64240364] 4983 [ 0.08495983 -0.88551527] 4986 [0.86512016 0.86489761] 3800 [-0.13561459 -0.25025789] 4144 [-0.736182 0.5850767] 4481 [ 0.74311269 -0.75189765] 4826 [-0.02920009 0.81298616] 4870 [-0.15813713 -0.48218549] 4762 [-0.52956211 0.62101788] 4763 [-0.80303208 -0.88032828] 3293 [-0.20802869 0.85122242] 3512 [-0.2121098 0.6799114] 4045 [-0.26091064 0.36995697] 4228 [0.1256863 0.47398418] 4827 [0.18771618 0.20424482] 4829 [0.62446465 0.75625044] 4610 [-0.74148599 0.22707357] 4735 [-0.76776014 0.89464478] 3690 [-0.05248942 -0.69199414] 3767 [-0.83767043 -0.60667575] 4392 [-0.96046906 -0.19105038] 4625 [0.92580083 0.76521529] 4883 [ 0.34459389 -0.02107828] 3529 [-0.41875714 0.33629189] 3614 [ 0.9201699 -0.94009933] 3730 [0.08569959 0.83643425] 3879 [ 0.37977107 -0.42491136] 3906 [-0.39260956 -0.52130444] 4005 [-0.4660635 0.49859716] 4233 [0.12660315 0.73191049] 4252 [0.65058265 0.80199224] 4290 [0.70685331 0.69211689] 4354 [-0.10552472 0.62846813] 4788 [0.14977058 0.07284915] 4834 [0.51468471 0.6371852 ] 3078 [-0.54146352 -0.54201087] 3162 [ 0.71994177 -0.40121656] 3436 [-0.36839863 0.67123832] 3469 [-0.61219418 -0.14143739] 3567 [-0.82529727 -0.4202689 ] 4194 [0.99880854 0.81030019] 4267 [0.39991605 0.67844941] 4603 [-0.96455469 0.56665641] 4652 [-0.00846154 -0.79553972] 4731 [-0.70824885 0.92033274] 4976 [-0.08597983 0.13694158] 4636 [0.56136576 0.2711896 ] 4807 [-0.2544723 0.91379008] 4789 [-0.27528017 -0.18390174] 4846 [-0.29808507 -0.28585688] 4743 [ 0.02684483 -0.12364556] 4868 [-0.78808656 -0.32666867] 3193 [-0.06854027 -0.17807716] 3431 [-0.23133136 -0.64581471] 3500 [-0.84262122 0.82364748] 3030 [ 0.40311867 -0.21029159] 3243 [ 0.85955516 -0.73248954] 3609 [-0.90408883 0.39369518] 3610 [-0.49682803 -0.1048443 ] 3801 [ 0.11776002 -0.52272868] 4338 [-0.90423538 -0.92037215] 4588 [-0.15713708 -0.68770481] 4878 [ 0.42479657 -0.26534507] 3138 [0.22313656 0.14937802] 3342 [-0.58874714 0.10713971] 3941 [-0.53546001 -0.74506954] 3394 [ 0.42658872 -0.26293702] 3944 [-0.18475802 0.99916521] 3956 [0.96689583 0.79114421] 4701 [ 0.40855162 -0.80496292] 4887 [-0.1728207 0.53262792] 4574 [-0.47876922 0.50483193] 4867 [-0.92366525 -0.83094639] 3365 [-0.64078489 0.66094366] 3852 [-0.73559921 0.31331204] 4356 [-0.87872374 -0.78899989] 4576 [0.77540838 0.45846795] 4824 [0.43038636 0.41264046] 4905 [0.92745456 0.25183729] 4927 [-0.23509697 -0.31387647] 4773 [ 0.79671128 -0.20664385] 4919 [ 0.4388411 -0.37706324] 4675 [ 0.07049094 -0.61791024] 4760 [ 0.27367841 -0.33554027] 4886 [-0.9027495 0.28863445] 4849 [-0.40673479 -0.09784668] 4996 [-0.93324051 0.60063873] 4563 [ 0.77692842 -0.25665672] 4946 [0.25070596 0.15298544] 3217 [-0.97803552 0.66011387] 3379 [-0.54773207 0.11256368] 4417 [-0.03534757 0.80709515] 4486 [0.87059571 0.37530031] 4562 [-0.65816732 -0.6527398 ] 4795 [0.40745022 0.83373416] 3703 [-0.16160908 0.6137326 ] 4362 [0.69573669 0.78007523] 3060 [-0.16624129 0.63154659] 3068 [ 0.22509234 -0.47215323] 3348 [-0.76256047 -0.21671387] 3505 [0.66486156 0.20628663] 3915 [ 0.42560269 -0.85506307] 3936 [0.28141587 0.55986051] 4136 [ 0.69691813 -0.56984953] 4153 [0.51996457 0.67442635] 4244 [ 0.90410822 -0.25371035] 4365 [-0.04146938 0.37702478] 3631 [-0.88136197 0.38089562] 4465 [ 0.37208864 -0.7316791 ] 4945 [ 0.8734712 -0.7412094] 4989 [-0.7500131 0.54203052] 4561 [ 0.99478077 -0.49772307] 4764 [ 0.11756996 -0.57184979] 3191 [-0.77486535 0.85879784] 3460 [-0.73093689 0.5644405 ] 4124 [ 0.22479023 -0.87292505] 4224 [ 0.16220694 -0.01194152] 4395 [-0.55369099 -0.39028115] 4396 [ 0.3114966 -0.18403057] 3074 [-0.4019957 0.36331487] 3177 [ 0.42948676 -0.11806328] 3044 [-0.60230863 -0.53704361] 3126 [0.09296313 0.95517903] 3407 [ 0.88954875 -0.25633875] 3540 [0.94500282 0.88920969] 4001 [0.31782198 0.93338312] 4378 [0.97934029 0.3189182 ] 4436 [-0.53776759 -0.21573922] 3142 [0.9983642 0.96315805] 4405 [-0.86233918 0.45478013] 4720 [-0.70003986 -0.20496442] 4740 [0.17426227 0.6093581 ] 3917 [-0.07732332 -0.59343673] 3971 [0.00838061 0.09068325] 4355 [ 0.26728183 -0.31981475] 4598 [ 0.86553857 -0.11467457] 4910 [-0.26845976 -0.59156676] 3061 [0.15667325 0.4333029 ] 3153 [ 0.52982588 -0.00735777] 3154 [ 0.31708736 -0.02917978] 3204 [ 0.63108339 -0.81433931] 3370 [ 0.18313888 -0.58517146] 3457 [-0.91716903 -0.05644253] 3593 [-0.75741875 -0.37187845] 3649 [0.952136 0.17463329] 3656 [-0.9089708 0.81999709] 4141 [0.3914831 0.61548138] 4291 [-0.91568387 -0.55802475] 4321 [0.10694887 0.81681711] 4532 [-0.8546516 -0.51857889] 4212 [-0.66467069 -0.41896372] 4266 [ 0.71450672 -0.99223834] 3390 [0.4011436 0.99573196] 4163 [-0.25758609 -0.05790198] 4833 [0.97784429 0.75794338] 4860 [-0.58977468 0.61615468] 4925 [ 0.13666404 -0.98097802] 3062 [ 0.48548288 -0.11995141] 3548 [-0.58259562 0.62020689] 4275 [0.94999305 0.21168262] 3918 [0.50119935 0.62555777] 4283 [ 0.32185558 -0.29388798] 3636 [0.06136464 0.68166918] 3736 [ 0.47210975 -0.13581696] 3825 [-0.50508003 -0.29618033] 3827 [ 0.16460668 -0.66970659] 4422 [ 0.40234251 -0.96822475] 3666 [-0.90796705 -0.10615397] 4497 [0.99192322 0.87301045] 3115 [0.07032736 0.69448424] 3295 [-0.36585246 -0.24268267] 3579 [ 0.11203602 -0.81523507] 3621 [ 0.22513924 -0.63838667] 3933 [0.82940165 0.94601488] 4150 [-0.13671518 -0.506086 ] 3155 [ 0.35509653 -0.83856932] 3701 [ 0.92397926 -0.721796 ] 3748 [0.29005942 0.96712244] 3849 [ 0.14157863 -0.20440311] 3897 [ 0.47921425 -0.30801178] 4480 [-0.14958957 0.4715774 ] 3417 [ 0.76060944 -0.80290422] 3456 [0.94232265 0.23997801] 3989 [ 0.46730405 -0.50472221] 4128 [ 0.89913332 -0.87128295] 4142 [-0.69745002 -0.27079199] 3037 [-0.68348321 -0.24801034] 3270 [ 0.8216502 -0.05899689] 3308 [ 0.31994865 -0.65985131] 3519 [ 0.06725527 -0.49591736] 3716 [0.1963899 0.46968139] 3718 [ 0.60298452 -0.379235 ] 3742 [-0.42891873 -0.31986808] 3778 [-0.4136138 0.82790662] 3856 [-0.48597634 0.77637024] 3911 [-0.45615539 -0.3028542 ] 4192 [0.94178988 0.35175477] 4246 [-0.3533115 -0.11884224] 4272 [-0.40466513 0.67215969] 4313 [-0.74683796 -0.67162465] 4333 [ 0.48842782 -0.73508825] 4528 [-0.03331998 0.32856537] 3485 [ 0.24782144 -0.0996596 ] 3922 [ 0.39094446 -0.69240344] 4277 [-0.14181436 0.05694525] 3381 [0.56684997 0.27502282] 3384 [-0.0934669 -0.05961059] 3648 [-0.61498726 0.76333745] 4324 [ 0.93065133 -0.91934273] 4729 [-0.19383796 -0.5199996 ] 4956 [-0.17923896 0.278094 ] 2087 [ 0.98195938 -0.08763889] 2336 [-0.39472306 -0.88106983] 3284 [-0.18157361 0.62998102] 3483 [-0.15081194 0.05784893] 3528 [0.86195316 0.4935329 ] 3655 [0.54034964 0.28980814] 3854 [0.05212252 0.58863683] 3954 [ 0.3999689 -0.23063471] 4215 [0.22383315 0.25854397] 3070 [0.22767285 0.61103585] 4535 [-0.6692891 0.90461765] 3802 [ 0.01963117 -0.48282015] 4289 [0.7079454 0.06682188] 3145 [0.08615065 0.74121479] 3164 [0.52069818 0.49393326] 3499 [ 0.33994721 -0.00547555] 3553 [0.99457556 0.46872969] 3683 [0.43912246 0.29297898] 3768 [-0.07924864 -0.27332552] 3901 [ 0.11308829 -0.64242461] 4135 [-0.1825273 0.82540306] 4171 [-0.61193213 0.93948132] 4681 [0.72600269 0.42526872] 4741 [ 0.35891529 -0.61149436] 3711 [0.90478119 0.51977586] 3882 [-0.58210164 -0.03750133] 4003 [ 0.04081386 -0.27906704] 3415 [-0.91303997 -0.54116286] 4301 [0.50014508 0.88753336] 4314 [-0.37532238 0.00062746] 4357 [-0.19043552 -0.73235013] 3179 [0.53490305 0.20448241] 3302 [-0.0869853 0.90998807] 3383 [-0.18694107 0.13331062] 4193 [-0.66466428 -0.28625751] 3526 [0.31367132 0.01585891] 3775 [-0.20477307 -0.39505284] 4358 [0.84677592 0.83656098] 3841 [ 0.74485401 -0.68056104] 3877 [-0.4283381 -0.94225453] 3888 [0.74397651 0.15646339] 4149 [0.77049218 0.99686054] 4388 [-0.88132525 0.98891515] 4495 [ 0.45624717 -0.28781148] 3108 [-0.75383614 -0.19050765] 3320 [0.41383295 0.99926045] 3332 [0.07826366 0.67603769] 3537 [ 0.28395923 -0.72122613] 3633 [-0.43755278 0.14126091] 3747 [-0.61458354 0.48961521] 3798 [-0.27114255 -0.93915011] 4017 [-0.12301022 -0.42731661] 4273 [-0.3824555 -0.88793224] 3166 [0.34742617 0.96331723] 3450 [-0.80629959 -0.28624856] 3909 [-0.28703826 0.06580046] 3935 [0.90138286 0.9214226 ] 4269 [-0.36502124 0.19566491] 3660 [0.14117841 0.1797883 ] 4287 [ 0.18989549 -0.78186443] 4322 [ 0.81139464 -0.32493184] 4533 [ 0.16358761 -0.28959602] 3283 [ 0.10514618 -0.05325304] 3665 [ 0.8233538 -0.15274349] 3679 [ 0.94357845 -0.04465503] 3866 [-0.42414578 0.3023681 ] 4458 [0.62115826 0.46413127] 3498 [0.05778177 0.74967736] 3735 [0.46171115 0.80917799] 3511 [ 0.98946345 -0.96600127] 3560 [ 0.69978 -0.43903546] 3774 [ 0.05629016 -0.86127698] 3899 [-0.45635497 0.2913166 ] 3221 [ 0.09862568 -0.86104688] 3850 [ 0.37022546 -0.76804566] 3861 [-0.3863402 -0.56343757] 4600 [-0.84431981 -0.34831011] 4936 [-0.05195653 0.96280485] 3715 [-0.03941067 -0.92431756] 3766 [ 0.32216814 -0.26326834] 4466 [ 0.72517629 -0.10474219] 3059 [ 0.21218686 -0.42980154] 3369 [-0.36238362 0.86254183] 3373 [-0.70019084 0.52230677] 3670 [0.50443362 0.11900626] 3832 [-0.59336549 0.89135151] 4350 [-0.98669897 0.51422386] 3098 [ 0.72092399 -0.36952564] 3942 [0.3575609 0.81409803] 3677 [-0.06340268 -0.32827427] 3693 [0.1117909 0.34204573] 4407 [ 0.7336184 -0.17866716] 3203 [-0.70525291 0.8171444 ] 3503 [-0.3551913 -0.61930597] 4302 [ 0.08829252 -0.74814942] 3583 [-0.09653308 -0.09750654] 3953 [-0.34862318 -0.02711016] 3024 [-0.46905411 0.93977034] 3424 [-0.70970272 -0.27927367] 3826 [-0.35299485 -0.083738 ] 3964 [ 0.2766346 -0.79972144] 4170 [0.10799789 0.31893396] 4236 [-0.31123621 0.67208316] 4295 [ 0.63331387 -0.07617358] 4421 [ 0.0505067 -0.02328959] 4515 [0.12036412 0.08185513] 3223 [-0.29078908 0.42767299] 3400 [-0.64123847 -0.49869044] 3414 [ 0.9218675 -0.72381418] 3198 [-0.87518935 0.76795821] 3577 [-0.20445424 0.52870998] 3797 [0.00484254 0.9852143 ] 3816 [-0.87272877 -0.02436954] 4125 [-0.14162964 -0.36593132] 4257 [0.48549615 0.25329862] 4346 [ 0.35534084 -0.86863862] 4364 [-0.78903479 -0.48844712] 4451 [ 0.82702314 -0.03718974] 4513 [ 0.11678999 -0.75079 ] 3122 [ 0.68269447 -0.46558854] 4013 [-0.92433102 0.74549778] 4259 [-0.85144337 0.94016124] 4319 [-0.05083005 -0.45407325] 4341 [-0.23564774 0.36984302] 4482 [0.65225067 0.73848205] 4538 [ 0.61833144 -0.91835714] 3263 [ 0.81896224 -0.45207886] 3372 [ 0.45349186 -0.80871869] 4644 [0.35877424 0.81011138] 4667 [-0.07783209 -0.42450347] 3090 [ 0.92414308 -0.53617526] 3287 [-0.21595458 -0.86661034] 3782 [ 0.65104993 -0.30528005] 4191 [ 0.38158549 -0.4746144 ] 3641 [ 0.85765848 -0.0285267 ] 4327 [ 0.8363687 -0.95362025] 4546 [ 0.43348567 -0.24279406] 3546 [0.86142783 0.22342867] 3789 [ 0.33187945 -0.65832425] 3056 [-0.02416343 -0.58444468] 4462 [0.47578495 0.35926309] 4742 [ 0.16192358 -0.36631342] 4832 [0.70245301 0.01586195] 4904 [ 0.47575639 -0.27186078] 3035 [ 0.90075053 -0.88323135] 3476 [-0.44886444 -0.75501038] 3720 [ 0.88921037 -0.294986 ] 3780 [-0.31701992 -0.58998076] 4002 [0.73558321 0.08043016] 4137 [ 0.76613895 -0.77960486] 4138 [ 0.64524965 -0.87446678] 4540 [0.33931369 0.83870824] 3135 [-0.80522868 0.46960789] 3290 [-0.83142714 0.09187261] 3428 [-0.59771 -0.91989489] 3937 [0.91192318 0.70041233] 3313 [ 0.4089614 -0.51877302] 3334 [0.82487904 0.9611386 ] 3064 [0.93479579 0.55825234] 3071 [-0.71966111 -0.63456708] 3966 [-0.86731477 -0.72206421] 4126 [0.69687279 0.61692296] 4550 [-0.98975712 -0.48110438] 3443 [0.58636951 0.26337105] 3773 [0.99568879 0.69361529] 3904 [ 0.39210158 -0.46076228] 3171 [-0.76269226 0.10685422] 3328 [ 0.7366004 -0.43274592] 3835 [0.74345849 0.33364573] 3908 [ 0.70957642 -0.82664654] 3815 [ 0.74532475 -0.99362959] 4330 [-0.37867987 -0.7526962 ] 3131 [0.24567739 0.02918909] 3200 [-0.69870175 0.49872993] 3571 [-0.75321926 -0.54531474] 3606 [-0.73601909 -0.9499221 ] 4438 [ 0.93796408 -0.56315222] 3216 [-0.69442081 0.26179571] 3341 [0.17684475 0.72198058] 3929 [-0.1829965 -0.67987147] 4473 [-0.31566734 -0.89424965] 3092 [ 0.11078013 -0.27830361] 3792 [-0.29454537 0.99241894] 3972 [-0.99020314 0.96334293] 3076 [ 0.30861875 -0.66650839] 4240 [-0.14211826 0.36838125] 3152 [-0.81209438 0.70763161] 3618 [-0.0965405 0.60132142] 3865 [ 0.89880819 -0.79504713] 4134 [ 0.45969036 -0.51700163] 3180 [-0.81718752 -0.45742718] 3987 [-0.24608597 0.97096467] 4209 [0.77754745 0.39425428] 4237 [0.67512466 0.44469638] 3261 [0.96269777 0.94137223] 3530 [0.40105703 0.500993 ] 3975 [-0.49644236 0.21282784] 3277 [-0.6080488 -0.22644799] 3585 [0.05973749 0.01411777] 4143 [ 0.23964717 -0.5584029 ] 4282 [-0.1196725 -0.17187243] 4300 [0.79294475 0.64982047] 3952 [0.25912551 0.04386337] 4445 [ 0.95038686 -0.03779606] 3289 [ 0.69114383 -0.54855183] 3435 [0.45998955 0.26598708] 3588 [0.52674554 0.83607259] 4437 [0.71053297 0.53100755] 4020 [-0.89003339 0.45220293] 4424 [ 0.87947318 -0.04559412] 4663 [-0.73198264 -0.06560855] 4972 [-0.4225516 -0.00455398] 3902 [-0.40896859 0.44663245] 4167 [-0.90194722 0.37277492] 4381 [-0.67134893 0.99810533] 3304 [-0.32792547 0.31298547] 3752 [ 0.18167856 -0.68753018] 3084 [-0.78440569 -0.60323872] 3481 [0.31266563 0.32505241] 3114 [-0.40194192 -0.87855715] 3346 [-0.87452377 0.69713677] 3494 [0.60980354 0.13694393] 3616 [ 0.87194758 -0.84558804] 4195 [-0.45131442 -0.12461063] 4339 [-0.80996119 -0.66707876] 4340 [ 0.61628545 -0.10086384] 3314 [-0.29704362 0.91870584] 3382 [-0.27697262 -0.72516382] 3790 [ 0.63654318 -0.12875614] 3872 [ 0.37387005 -0.66004581] 4517 [0.99893835 0.6315212 ] 3118 [ 0.80463643 -0.55193734] 4504 [0.47677462 0.4749075 ] 3556 [ 0.71326087 -0.27234797] 4253 [-0.89698202 0.61938065] 3454 [-0.29820651 0.88939573] 3880 [-0.81552291 -0.14349785] 3233 [ 0.62048795 -0.39811256] 3368 [-0.99486665 -0.08796497] 3706 [-0.36059792 -0.14142927] 3725 [-0.15891506 -0.77009939] 3764 [ 0.00047841 -0.23759048] 3255 [-0.51672908 0.70090976] 4368 [0.68556123 0.06415327] 4079 [-0.18092837 0.17781876] 4093 [-0.97286224 0.22659008] 3043 [-0.40041489 -0.90174127] 3639 [0.58425725 0.18964066] 4468 [0.20934995 0.54674683] 3696 [0.80253368 0.58216247] 3891 [0.96215764 0.40754959] 4432 [-0.62376779 -0.02354909] 4460 [ 0.59461042 -0.19328171] 3516 [0.42617051 0.56216768] 3627 [-0.4630203 -0.21063809] 3245 [-0.34052333 0.7041012 ] 3452 [-0.15182011 -0.94676982] 4297 [-0.29312327 0.44147509] 3134 [0.88552613 0.28388322] 3659 [ 0.430062 -0.64756554] 3455 [-0.29531238 0.62827556] 4503 [-0.16968341 0.73126124] 3161 [ 0.96532908 -0.09607339] 3462 [-0.61124273 0.30597944] 4450 [0.55492836 0.53817256] 3176 [-0.81896186 -0.59963371] 3189 [-0.38059589 -0.82322981] 3617 [0.20052836 0.61628671] 3598 [ 0.7570935 -0.47456021] 3783 [-0.72809369 -0.96331216] 3822 [-0.26584832 -0.69191085] 4336 [0.16837115 0.47156434] 4397 [-0.0474948 -0.02781288] 3318 [-0.19776036 -0.04720657] 3506 [-0.50299954 -0.65118959] 3761 [ 0.94551588 -0.4386438 ] 4448 [-0.04590342 0.63111224] 3050 [ 0.82296517 -0.03612126] 3088 [-0.3495818 -0.50647538] 3101 [ 0.3558828 -0.14773893] 3361 [ 0.66861558 -0.85906863] 3949 [ 0.8304011 -0.77251592] 4280 [-0.54417133 -0.45224593] 4435 [ 0.71491901 -0.56346858] 3484 [ 0.8287521 -0.75301858] 3676 [ 0.60549545 -0.69735357] 2440 [-0.53957632 -0.22942914] 2983 [-0.13332021 -0.24421727] 3249 [0.21779611 0.31764502] 3544 [0.79445505 0.31202144] 3692 [ 0.71450698 -0.94629696] 3285 [ 0.02530396 -0.92210224] 3515 [-0.19556633 -0.15858316] 4140 [-0.61122528 -0.50150195] 3244 [-0.27444302 -0.98634791] 3301 [0.19822576 0.59403236] 4320 [-0.27704861 -0.71028403] 3063 [-0.89455552 0.40183994] 3167 [ 0.34234796 -0.91655032] 3473 [0.04853236 0.79793906] 3723 [-0.47784987 0.91296671] 4423 [ 0.2181828 -0.36350861] 3863 [0.54770086 0.11576876] 4485 [ 0.52633789 -0.58828071] 3055 [ 0.43578996 -0.23167322] 4285 [ 0.89842813 -0.61197013] 3419 [-0.17601893 0.90521949] 3859 [ 0.76580736 -0.42926668] 3300 [-0.21294827 0.66832715] 3403 [-0.91259546 -0.56931719] 3493 [0.65979347 0.86038941] 3541 [ 0.22271944 -0.63726321] 3038 [0.32760201 0.09872055] 3310 [-0.67241358 0.85273571] 3895 [0.07064462 0.21677937] 3495 [-0.87292944 -0.02624118] 3673 [0.31418632 0.34577326] 3202 [-0.85528168 0.6180529 ] 3501 [ 0.65774973 -0.94899523] 4469 [-0.37236441 0.52678279] 3509 [0.93467158 0.07482088] 4156 [0.15877824 0.89421566] 3392 [0.18560699 0.07058219] 4371 [-0.96236804 -0.28261465] 3133 [0.80856753 0.82422726] 4323 [ 0.827168 -0.12050051] 3184 [-0.79148927 0.31099274] 3728 [0.34521357 0.92531626] 3892 [-0.78851837 -0.74732676] 3959 [-0.98405238 0.49648934] 3113 [-0.53197032 0.66901771] 3181 [-0.31845806 -0.15379574] 3615 [-0.29076253 -0.21580621] 4158 [-0.76281453 0.76995104] 4223 [-0.17638608 -0.55210737] 3422 [0.4494971 0.54266429] 3502 [ 0.52175128 -0.52973193] 3401 [-0.23860066 -0.01012478] 4232 [0.5829964 0.82228898] 4530 [ 0.92024538 -0.61418549] 3680 [-0.00330652 -0.58602255] 3788 [-0.14861414 0.94594743] 3712 [-0.53430336 0.60321836] 4204 [0.07334043 0.97595538] 3357 [0.93825943 0.56341625] 3630 [0.81592468 0.26509602] 2311 [0.25754223 0.31700767] 2312 [0.51160989 0.26773627] 3340 [-0.30051849 -0.38648158] 4345 [-0.48712613 -0.81481185] 3886 [ 0.0395261 -0.25182748] 4011 [-0.06039462 0.81789096] 3434 [-0.12313875 -0.03949971] 4014 [0.23223166 0.50072892] 3085 [0.00899142 0.8103856 ] 3364 [0.62748368 0.64301244] 3520 [0.6193071 0.33203584] 3940 [ 0.68863022 -0.69422372] 3139 [-0.01194198 0.59450435] 3907 [0.87716148 0.62997541] 3881 [-0.3996043 -0.12283224] 3993 [0.16314199 0.28382574] 3733 [-0.67650426 0.29919234] 4262 [0.78480325 0.34491059] 3839 [-0.76194024 0.98090704] 4274 [-0.28878246 -0.98801815] 3770 [ 0.23031334 -0.06404984] 4278 [-0.35078564 0.69999072] 1115 [0.54874355 0.70702917] 1150 [ 0.07202918 -0.80578216]
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[53], line 1 ----> 1 vis_loadings_matplotlib(LOADINGS, VSHORT, a=2, b=3) Cell In[51], line 28, in vis_loadings_matplotlib(LOADINGS, VSHORT, a, b, size_col, label_col, color_col) 23 texts.append(plt.text( 24 row[f"PC{a}"], row[f"PC{b}"], row[label_col], 25 fontsize=9, ha='center', va='bottom')) 27 # Reduce overlap ---> 28 adjust_text(texts, arrowprops=dict(arrowstyle='-', color='gray')) 30 # Labels and layout 31 plt.title(f"Loadings Scatter (PC{a} vs PC{b})") File ~/.local/lib/python3.11/site-packages/adjustText/__init__.py:724, in adjust_text(texts, x, y, objects, target_x, target_y, avoid_self, prevent_crossings, force_text, force_static, force_pull, force_explode, pull_threshold, expand, max_move, explode_radius, ensure_inside_axes, expand_axes, only_move, ax, min_arrow_len, time_lim, iter_lim, *args, **kwargs) 721 while error > 0: 722 # expand = expands[min(i, expand_steps-1)] 723 logger.debug(step) --> 724 coords, error = iterate( 725 coords, 726 target_xy_disp_coord, 727 static_coords, 728 force_text=force_text, 729 force_static=force_static, 730 force_pull=force_pull, 731 pull_threshold=pull_threshold, 732 expand=expand, 733 max_move=max_move, 734 bbox_to_contain=ax_bbox, 735 only_move=only_move, 736 ) 737 if prevent_crossings: 738 coords = remove_crossings(coords, target_xy_disp_coord, step) File ~/.local/lib/python3.11/site-packages/adjustText/__init__.py:329, in iterate(coords, target_coords, static_coords, force_text, force_static, force_pull, pull_threshold, expand, max_move, bbox_to_contain, only_move) 315 def iterate( 316 coords, 317 target_coords, (...) 326 only_move={"text": "xy", "static": "xy", "explode": "xy", "pull": "xy"}, 327 ): 328 coords = random_shifts(coords, only_move.get("explode", "xy")) --> 329 text_shifts_x, text_shifts_y = get_shifts_texts( 330 expand_coords(coords, expand[0], expand[1]) 331 ) 332 if static_coords.shape[0] > 0: 333 static_shifts_x, static_shifts_y = get_shifts_extra( 334 expand_coords(coords, expand[0], expand[1]), static_coords 335 ) File ~/.local/lib/python3.11/site-packages/adjustText/__init__.py:169, in get_shifts_texts(coords) 165 yoverlaps = overlap_intervals( 166 coords[:, 2], coords[:, 3], coords[:, 2], coords[:, 3] 167 ) 168 yoverlaps = yoverlaps[yoverlaps[:, 0] != yoverlaps[:, 1]] --> 169 overlaps = yoverlaps[(yoverlaps[:, None] == xoverlaps).all(-1).any(-1)] 170 if len(overlaps) == 0: 171 return np.zeros((coords.shape[0])), np.zeros((coords.shape[0])) AttributeError: 'bool' object has no attribute 'all'
Kernel usage not available
-
Variables
Callstack
Breakpoints
Source
xxxxxxxxxx